thecodingidiot.com

The VoiceHex and Pointers

Hex and Pointers

The previous page introduced the idea that any number can be expressed in any base by changing the digit set. This page puts that to work. %x, %X, and %p all produce hexadecimal output — and tci_putnbr_base already handles it. The only question is what hexadecimal actually is, and why it exists.

Hexadecimal

Hexadecimal is base 16. It needs 16 distinct symbols: the ten digits 0–9, then six letters to represent the values 10 through 15. The convention is a through f (or A through F for uppercase).

DecimalBinaryHexDecimalBinaryHex
000000810008
100011910019
200102101010a
300113111011b
401004121100c
501015131101d
601106141110e
701117151111f

The reason hex is ubiquitous in computing is in that table. A group of 4 binary bits has exactly 16 possible values — one hex digit represents exactly 4 bits. A byte is 8 bits, which is always exactly two hex digits.

The same address written in both bases:

hex:    00007fff5fbff8a0
binary: 0000000000000000011111111111111101011111101111111111100010100000

The hex version is 16 characters; the binary version is 64. Both represent the same 64-bit value. The 4:1 ratio is exact and always holds: every hex digit expands to exactly 4 binary digits, every 4 binary digits collapse to exactly one hex digit.

Octal

Base 8 uses digits 0–7. One octal digit represents 3 bits. It is less common than hex in modern code, but you have already used it: in f01/04, chmod 755 set file permissions using three octal digits — each digit encodes one permission group (owner, group, others) as a 3-bit value (read=4, write=2, execute=1). The shell printed the values in octal because 3 bits map cleanly to one digit, just as 4 bits map cleanly to one hex digit.

tci_putnbr_base can produce octal output too — pass "01234567" as the digit string. Nothing else changes.

%x and %X

%x and %X print an unsigned int in hexadecimal. The two specifiers produce identical values — the only difference is case. %x is the convention for memory addresses, checksums, and most debugging output. %X appears where uppercase is preferred: some binary file formats, Windows API documentation, certain assembly listings. Because the digit string is the only thing that changes, supporting both costs nothing:

if (spec == 'x')
    return (tci_putnbr_base(va_arg(*args, unsigned int),
            "0123456789abcdef", 1));  /* 16 chars → base 16, lowercase */
if (spec == 'X')
    return (tci_putnbr_base(va_arg(*args, unsigned int),
            "0123456789ABCDEF", 1));  /* 16 chars → base 16, uppercase */

Inside tci_putnbr_base, blen is 16. n % 16 gives a remainder between 0 and 15 — an index into the digit string. Index 10 maps to a (or A), index 15 maps to f (or F). The algorithm is identical to decimal; only the symbol set changes.

The same type discipline from the previous page applies: va_arg must use unsigned int, not int. The bit patterns agree on most values — which is exactly why the compiler will not warn and the tester may not catch it. The types are distinct; using the wrong one is undefined behaviour.

%p

%p prints a pointer value as a hexadecimal address with a 0x prefix. The argument type is void *.

The question is which integer type to cast the pointer to before passing it to tci_putnbr_base. On Linux x86-64, a pointer is 8 bytes. unsigned int is 4 bytes — half the width, and the upper 32 bits of any address above 4 GiB would be silently lost. The same LP64 relationship from the previous page applies: uintptr_t from <stdint.h> is defined to be exactly as wide as a pointer on the current platform, making the cast safe on any Linux target.

#include <stdint.h>
 
static int  tci_print_ptr(void *ptr)
{
    int  count;
 
    if (!ptr)
        return (tci_putstr_fd("(nil)", 1));      /* NULL: glibc convention on Linux */
    count = tci_putstr_fd("0x", 1);              /* mandatory prefix */
    count += tci_putnbr_base((uintptr_t)ptr,
            "0123456789abcdef", 1);             /* cast to uintptr_t, then hex */
    return (count);
}

In dispatch:

if (spec == 'p')
    return (tci_print_ptr(va_arg(*args, void *)));

Add #include <stdint.h> to tci_printf.c if it is not already there.

The NULL pointer case prints (nil) — this is glibc's behaviour on Linux. The tester compares tci_printf against libc printf using the same pointer value for both, so the output must match exactly.

make re
bash test.sh

The %x, %X, and %p rows must all pass. One specifier remains.