Unix File Descriptors

First published — Aug 07, 2023
Last updated — Aug 07, 2023
#tools #quiz

File descriptor, fd. Quiz.

Table of Contents

Introduction

In Unix, whenever a running program opens a file, the kernel stores a reference to it in the process’ memory. Those references to open files are integers starting from 0 for each process, and are called file descriptors or FDs. A process can only read and write files that it has opened, that is, for which it had obtained a file descriptor.

Three file descriptors, 0, 1, and 2, are pre-initialized by the kernel for every process and are available without having to open them. They are thus called standard, and are explained in more detail in .

Other than being automatically present and standard, they are the same as any other file descriptor.

File Descriptors in Action

The first question we might ask ourselves is, how could we see those files descriptors anywhere?

In Unix and especially Linux, enormous amount of data about the running system is accessible to users in /proc/ and /sys/. For maximum convenience, all that data looks like normal files and directories, but is not stored on disk. When we read or write files in /proc/ and /sys/, internally we invoke functions in the kernel that operate directly on kernel memory.

On Unix, every running program (a process) is assigned a process ID (PID) — an integer that remains constant for the duration of the process. By convention, all data about a particular process is available its directory /proc/PID/, such as /proc/11996/. That data includes information about all open file descriptors and their location.

If you are currently in a shell, type echo $$ to see the PID assigned to your process. Then explore the contents of /proc/PID/fd*/.

echo $$
11996

cd /proc/$$

ls -al fd/
total 0
dr-x------ 2 user user  0 Aug  7 20:47 .
dr-xr-xr-x 9 user user  0 Aug  7 20:47 ..
lrwx------ 1 user user 64 Aug  7 20:56 0 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  7 20:56 1 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  7 20:56 2 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  7 20:56 255 -> /dev/pts/8

ls -al fdinfo/
total 0
dr-x------ 2 user user 0 Aug  7 20:56 .
dr-xr-xr-x 9 user user 0 Aug  7 20:47 ..
-r-------- 1 user user 0 Aug  7 20:56 0
-r-------- 1 user user 0 Aug  7 20:56 1
-r-------- 1 user user 0 Aug  7 20:56 2
-r-------- 1 user user 0 Aug  7 20:56 255

Here we can see that our shell only has 4 files open. We are interested in the first three; standard input, output, and error. They point to a device that represents our terminal. You can read more in .

A similar output can be obtained with lsof -p $$.

lsof -p $$ | grep CHR

COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
bash    11996 user    0u   CHR 136,8      0t0     13 /dev/pts/8
bash    11996 user    1u   CHR 136,8      0t0     13 /dev/pts/8
bash    11996 user    2u   CHR 136,8      0t0     13 /dev/pts/8
bash    11996 user  255u   CHR 136,8      0t0     13 /dev/pts/8

Not Just Files

World is a File. — Bill Joy in 1988 at IBM Yorktown.

File descriptors are references to open files.

But one of the key principles in Unix is that everything is represented as a file on disk. It’s just that some files are “special”, so instead of triggering functions in the filesystem driver, they trigger functions in other drivers.

Because of the uniformity of file descriptors and read and write system calls that all devices support, common functionality is automatically extended to just about every device and subsystem in Unix and Linux.

For example, a file can be a physical file on disk, a block device like disk itself, a character device like a terminal or a printer, a socket for network connection, a named pipe for inter-process communication (IPC), and so on.

More about what this means for users, including simple examples of redirection between file descriptors, can be found in .

Quiz

We know the following:

  • In shells such as bash, PID of the current process is accessible in variable $$.

  • Directory /proc/PID/fd/ contains list of the process’ open file descriptors. Try ls -al /proc/$$/fd/.

  • The kernel also provides /proc/self/ which, for each process, automagically points to its real directory /proc/PID/. Try ls -al /proc/self/fd/.

  • Redundantly with /proc/self/fd/, but for compatibility, the Linux kernel also exports process’ file descriptors in directory /dev/fd/. Try ls -al /dev/fd/.

  • Thus, /proc/$$/fd/, /proc/self/fd/, and /dev/fd/ are functionally equivalent.

But, why are the contents of those directories not the same when we try to list them?

ls -al /dev/fd/ /proc/$$/fd/ /proc/self/fd/

/dev/fd/:
total 0
dr-x------ 2 user user  0 Aug  8 00:58 .
dr-xr-xr-x 9 user user  0 Aug  8 00:58 ..
lrwx------ 1 user user 64 Aug  8 00:58 0 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  8 00:58 1 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  8 00:58 2 -> /dev/pts/8
lr-x------ 1 user user 64 Aug  8 00:58 3 -> /proc/14693/fd

/proc/14677/fd/:
total 0
dr-x------ 2 user user  0 Aug  8 00:58 .
dr-xr-xr-x 9 user user  0 Aug  8 00:58 ..
lrwx------ 1 user user 64 Aug  8 00:58 0 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  8 00:58 1 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  8 00:58 2 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  8 00:58 255 -> /dev/pts/8

/proc/self/fd/:
total 0
dr-x------ 2 user user  0 Aug  8 00:58 .
dr-xr-xr-x 9 user user  0 Aug  8 00:58 ..
lrwx------ 1 user user 64 Aug  8 00:58 0 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  8 00:58 1 -> /dev/pts/8
lrwx------ 1 user user 64 Aug  8 00:58 2 -> /dev/pts/8
lr-x------ 1 user user 64 Aug  8 00:58 3 -> /proc/14693/fd

To answer the question, let’s examine the invocation and output of ls carefully.

  • Because of how shells work, variable $$ will be expanded into a PID of the current process, before ls is called. When ls sees that argument, its value will already be a literal (/proc/14677/fd/ in this example).

  • The other two arguments do not contain a variable and will be passed to ls as literal /proc/self/fd/ and /dev/fd/.

So when ls starts and looks at the arguments it was called with, the first argument, directory /proc/14677/fd/, will indeed be referring to the shell.

But the other two directories, /proc/self/fd/ and /dev/fd/, which always automatically point to the current process, will not point to bash, but to ls.

So the 3 directories listed by ls are different because they are showing two different processes.

See Also

https://en.wikipedia.org/wiki/File_descriptor

Automatic Links

The following links appear in the article:

1.