thecodingidiot.com

The VoiceIntegers

Integers

%d, %i, and %u print integers in base 10. The conversion from a binary integer to a sequence of decimal characters is the core operation that every score counter from 1977 to 1989 had to implement without a library.

The digit extraction problem

The integer 1234 is stored as a binary value. The characters '1', '2', '3', '4' are ASCII bytes 49, 50, 51, 52. They are not the same thing — converting one to the other requires arithmetic.

% 10 gives the remainder after dividing by 10. Because we are working in base 10, that remainder is always the last decimal digit — the units place. 1234 % 10 is 4 because 1234 = 123 × 10 + 4.

/ 10 is integer division — the same as regular division, except everything after the decimal point is discarded:

1234 / 10 = 123.4   →   integer division: 123

The fractional part is gone. Applying both operations repeatedly peels off one digit per iteration until nothing remains.

To convert a digit (0–9) to its ASCII character, add '0'. The character '0' is ASCII 48; adding 4 gives 52, which is '4'. The same offset works for every digit.

This is the same fixed-offset trick from c01/04: tci_toupper and tci_tolower add or subtract 32 because uppercase and lowercase letters are exactly 32 apart in ASCII. The digit offset is the same idea applied to a different range.

1234 % 10 = 4   →   '0' + 4 = '4'   (ASCII 52)
1234 / 10 = 123
 123 % 10 = 3   →   '0' + 3 = '3'   (ASCII 51)
 123 / 10 = 12
  12 % 10 = 2   →   '0' + 2 = '2'   (ASCII 50)
  12 / 10 = 1
   1 % 10 = 1   →   '0' + 1 = '1'   (ASCII 49)
   1 / 10 = 0   →   stop

The digits arrive in reverse order. A buffer of fixed size holds them; a reverse pass puts them in the correct order before writing. An int has at most 10 decimal digits plus a sign, so a buffer of 12 bytes is enough for base 10. The function below uses 64 — because it handles any base, and in base 2 an unsigned long on a 64-bit platform needs up to 64 binary digits.

Number bases

You use base 10 every day without thinking about it. "Base 10" just means there are 10 distinct digits — 0 through 9. When you run out of digits in a position, you carry over to the next: 9 + 1 becomes 10, not a new symbol. The position to the left is worth ten times the position to the right.

Any other number of symbols works the same way. The breadboard chapter used base 2: only two digits, 0 and 1. When a bit is 1 and you add 1 more, you carry: 1 + 1 becomes 10 in binary (one group of two, zero units). Every position to the left is worth twice the one to its right. The number 1234 in decimal looks different in other bases, but it represents the same quantity:

1234 in base 10:  1234
1234 in base 16:  4d2   (hex: 4×256 + 13×16 + 2)
1234 in base  2:  10011010010   (binary)

Base 16 (hexadecimal) is common in programming because four binary digits map exactly to one hex digit — it is a compact way to write binary values. That is why %x and %p use it.

The digit extraction algorithm works for any base. Instead of % 10 and / 10, use % base and / base. The only other change is the digit set: base 10 uses "0123456789", base 16 uses "0123456789abcdef". The length of the string is the base.

tci_putnbr_base

Write a helper that works for any base, not just base 10. %d/%u pass "0123456789"; %x/%X pass a hex digit string. The same function handles all of them:

static int  tci_putnbr_base(unsigned long n, const char *base, int fd)
{
    char    buf[64];    /* enough for any unsigned long in any base */
    char    tmp;
    int     blen;
    int     len;
    int     i;
 
    blen = (int)tci_strlen(base);
    len = 0;
    if (n == 0)
        buf[len++] = base[0];   /* zero is a valid digit, not empty output */
    while (n > 0) {
        buf[len++] = base[n % blen];
        n /= blen;
    }
    i = 0;
    while (i < len / 2) {       /* reverse in place */
        tmp = buf[i];
        buf[i] = buf[len - 1 - i];
        buf[len - 1 - i] = tmp;
        i++;
    }
    write(fd, buf, len);
    return (len);
}

unsigned long as the parameter type is deliberate: the function is called with both unsigned int (from %u, %x, %X) and uintptr_t (from %p). On Linux, the LP64 data model guarantees that unsigned long is the same width as a pointer — 32 bits on a 32-bit system, 64 bits on x86-64 — so it is always wide enough for uintptr_t. This does not hold on Windows, where the LLP64 model keeps unsigned long at 32 bits even on 64-bit systems. This chapter runs on Linux, so LP64 applies.

%d and %i — signed decimal

Both specifiers behave identically. The argument is a signed int. The same width reasoning from above applies here: the function promotes to long before doing anything, for reasons that become clear at the edge of the type's range.

static int  tci_print_signed(int n)
{
    int   count;
    long  val;
 
    count = 0;
    val = n;                            /* promote to long before negating */
    if (val < 0) {
        count += tci_putchar_fd('-', 1);
        val = -val;                     /* negate: safe because val is long */
    }
    count += tci_putnbr_base((unsigned long)val, "0123456789", 1);
    return (count);
}

A 32-bit signed int holds values from −2147483648 to 2147483647. The positive range stops at 2147483647 — one short of the magnitude of the most negative value. Negating −2147483648 as an int would require storing 2147483648, which exceeds INT_MAX by exactly 1 and produces undefined behaviour.

This is the kind of edge case that goes unnoticed for a long time. Every other negative integer negates cleanly; only this one value breaks, and it only surfaces when someone passes INT_MIN to the function. A quick test of %d with −1, −100, or −32768 passes without issue — −2147483648 is the one that exposes it.

Promoting to long first gives 64 bits of room. The negation fits, and the cast to unsigned long before passing to tci_putnbr_base is then safe.

In dispatch:

if (spec == 'd' || spec == 'i')
    return (tci_print_signed(va_arg(*args, int)));

Where the limits come from

These numbers are not arbitrary. A 32-bit integer has 32 binary digits, giving 2³² = 4294967296 distinct values. An unsigned int uses all of them for positive numbers: 0 to 4294967295 (2³² − 1). A signed int splits that range in two — one bit encodes the sign, leaving 31 bits for the magnitude: 2³¹ − 1 = 2147483647 on the positive side, and −2³¹ = −2147483648 on the negative side. The asymmetry by one is a direct consequence of that split.

The same rule applies to every unsigned type. The unsigned char from c01/02 has 8 bits: 2⁸ = 256 values, 0 to 255. The bit width changes; the formula does not.

%u — unsigned decimal

After the INT_MIN detour, %u is a relief. An unsigned integer cannot be negative by definition — it uses the full 2³² range for positive values — so there is no sign to print and no value to negate. The widening cast to unsigned long is still needed because tci_putnbr_base expects it, but it cannot lose information: any 32-bit unsigned value fits in a 64-bit unsigned long.

static int  tci_print_unsigned(unsigned int n)
{
    return (tci_putnbr_base((unsigned long)n, "0123456789", 1));
}

In dispatch:

if (spec == 'u')
    return (tci_print_unsigned(va_arg(*args, unsigned int)));

Run man 3 printf — under d, i the manual specifies the int argument type; under u it specifies unsigned int. Using the wrong type in va_arg reads garbage from the argument list. The types look similar and the compiler will not warn — it is a silent mistake that only shows at runtime.

make re
bash test.sh

The %d, %i, and %u rows must all pass — including edge cases for 0, INT_MAX, INT_MIN, and UINT_MAX.