thecodingidiot.com

The ReaderThe read() Syscall

The read() Syscall

tci_getline takes a file descriptor — a plain int. The POSIX system call that reads bytes from a file descriptor into a buffer is read(). This page covers what read() does and what its return values mean before putting it to use inside tci_getline.

File descriptors in C

In f01/06 you saw file descriptors as shell numbers: 2>/dev/null, 2>&1, fd 0 for stdin and fd 1 for stdout. In c02/02 you used them in C for the first time — write(fd, buf, n) writes to a file descriptor. read(fd, buf, n) is the other direction.

Three file descriptors exist at program start:

fdNameDirection
0stdinread
1stdoutwrite
2stderrwrite

Every other file descriptor is created by calling open(). The kernel assigns the lowest available integer starting from 3. How to open a file and what to do when you are done with it is next.

open() and close()

<fcntl.h> stands for file control. It is a POSIX header that defines the functions and constants used to open, create, and control file descriptors at the system call level. The flags like O_RDONLY, O_WRONLY, and O_CREAT are defined here, as is open itself. It does not deal with buffered I/O — that is the <stdio.h> domain.

open takes a path, a set of flags, and returns a file descriptor:

int  open(const char *path,  /* file path */
          int flags,         /* access mode */
          ...);              /* optional mode for O_CREAT */

For reading, the flag is O_RDONLY. open returns the new file descriptor on success, or −1 on failure:

int  fd = open("questions.txt", O_RDONLY);
if (fd < 0) {
    /* file not found, permission denied, etc. */
}

Every open must be paired with a close. A file descriptor is a kernel resource — not closing it leaks it. The process has a limit on how many file descriptors it can hold open at once. A loop that opens files without closing them will eventually fail with EMFILE. Run ulimit -n in your terminal to check the limit on your system.

When you are done reading, close the descriptor in C:

close(fd);

The full signatures, all flags, and every error code are in the manual — man 2 open and man 2 close.

read()

read is defined in <unistd.h>:

ssize_t  read(int fd,           /* file descriptor to read from */
              void *buf,        /* buffer to write bytes into */
              size_t count);    /* maximum bytes to read */

Three return values matter:

  • Positive: the number of bytes placed in buf. May be less than count — a short read is not an error.
  • 0: end of file. No more bytes available on this descriptor.
  • −1: error. errno holds the reason.

read does not add a null terminator. The buffer is raw bytes. To treat it as a C string, add '\0' at position bytes_read before using any string function on it.

Why BUFFER_SIZE matters

A running program lives in user space — the region of memory the OS gives to each process. The kernel lives in a separate, privileged region. Code in user space cannot touch the kernel directly; it requests services through system calls.[1]

Every open, close, read, and write is a system call. Each one forces a context switch: the CPU saves the current state, raises its privilege level, executes kernel code, copies data across the boundary, then switches back. You have been paying this cost since c02/02 with every write(). With read() you pay it on the input side too.

The cost of a context switch is fixed — it exists whether you transfer 1 byte or 65536. That makes the transfer size critical. Reading 1024 bytes of a file:

BUFFER_SIZEread() calls
11024
8128
1288
10241

The number is set at compile time with -D BUFFER_SIZE=128. The caller controls the trade-off between memory and call frequency. tci_getline adapts to whatever value is chosen — including 1, which is the hardest case and the one the tester uses to stress-test the implementation.

A concrete exercise

Before adding any logic to tci_getline, write a standalone program that opens a file, reads it in chunks, and prints each chunk in brackets. Save it as readfile.c — this is scratch code, not part of libtci:

#include <fcntl.h>   /* open, O_RDONLY */
#include <unistd.h>  /* read, close */
#include <stdio.h>   /* printf */
 
#ifndef BUFFER_SIZE
# define BUFFER_SIZE 8
#endif
 
int     main(int argc, char **argv)
{
    char    buf[BUFFER_SIZE + 1];  /* +1 for the null terminator we add */
    int     fd;
    ssize_t bytes;
 
    if (argc != 2)
        return (1);
    fd = open(argv[1], O_RDONLY);
    if (fd < 0)
        return (1);
    while ((bytes = read(fd, buf, BUFFER_SIZE)) > 0) {
        buf[bytes] = '\0';       /* read() does not null-terminate */
        printf("[%s]", buf);     /* brackets show each chunk boundary */
    }
    printf("\n");
    close(fd);
    return (0);
}

Create a small test file:

printf "one\ntwo\nthree\n" > test.txt

Compile and run with BUFFER_SIZE=3:

gcc -Wall -Wextra -g -std=c99 -D BUFFER_SIZE=3 -o readfile readfile.c
./readfile test.txt

Output:

[one][
tw][o
][thr][ee
]

The brackets show exactly when each read() call returned. The chunk boundary falls in the middle of "two" and in the middle of "three". read() knows nothing about '\n' — it returns raw bytes. tci_getline must find the '\n', return everything up to it, and keep whatever came after it for the next call. The next page covers how that state survives between calls.

Footnotes

  1. User space and kernel space - Wikipedia