Shaping Output

Pipes are only useful if the tools on each end do something. grep and sort appeared already. A handful of others become indispensable once you have pipes — not because they are complex, but because they each do one small thing precisely and compose cleanly with everything else.

sort

sort names.txt              # alphabetical, ascending
sort -r names.txt           # reverse
sort -n numbers.txt         # numeric sort (not lexicographic)
sort -rn numbers.txt        # numeric, descending
sort -u names.txt           # sort and remove duplicates

sort reads lines and emits them in order. The distinction between -n and the default matters: without -n, 10 sorts before 9 because 1 < 9 lexicographically.

uniq

sort names.txt | uniq           # remove consecutive duplicates
sort names.txt | uniq -c        # count occurrences
sort names.txt | uniq -d        # print only lines that appear more than once

uniq only removes consecutive duplicates. Always sort first. The combination sort | uniq -c | sort -rn is a frequency counter for anything line-based. Use it to find the most common errors in a log, the most-called functions in a trace, the most repeated words in a file.

cat build.log | grep 'error' | sort | uniq -c | sort -rn | head -10

Ten most frequent error lines in a build log. You will use this exact pipeline^[1].

wc

wc -l main.c        # count lines
wc -w main.c        # count words
wc -c main.c        # count bytes

wc counts. Combined with pipes:

find . -name '*.c' | wc -l       # how many C files
grep -r 'malloc' src/ | wc -l    # how many malloc calls

cut

cut -d: -f1 /etc/passwd          # first field, colon-delimited
cut -d, -f2,4 data.csv           # fields 2 and 4, comma-delimited
cut -c1-10 file.txt              # characters 1 through 10 of each line

cut extracts columns from structured text. -d sets the delimiter, -f selects fields (1-indexed). Essential for parsing /etc/passwd, CSV files, or any fixed-format output.

tr

echo 'Hello World' | tr 'a-z' 'A-Z'    # to uppercase
echo 'hello world' | tr ' ' '_'        # replace spaces with underscores
echo 'hello' | tr -d 'l'               # delete all l characters
cat file.txt | tr -s ' '               # squeeze repeated spaces to one

tr translates or deletes characters. It reads stdin and writes stdout — it does not take a filename argument. Always use it in a pipe.

sed

sed is the stream editor — part of POSIX and available on every Unix-like system, just like vi. It reads lines, applies a transformation, and writes the result to stdout.

echo 'hello world' | sed 's/world/terminal/'   # hello terminal
cat app.log | sed 's/\[INFO\]//'               # strip INFO tags

The substitution expression s/pattern/replacement/ replaces the first match on each line. The g flag replaces all matches:

echo 'aaa' | sed 's/a/b/'      # baa — first match only
echo 'aaa' | sed 's/a/b/g'     # bbb — all matches (global)

-i edits the file in place without producing output:

sed -i 's/DEBUG/INFO/g' config.txt

The original file is modified directly. No temporary file, no redirect. Useful in scripts and in situations where vim's interactive editor is not what you want.

Putting it together

A build system outputs a file with one compilation unit per line, some repeated. You want a sorted list of unique filenames with counts:

cat build.log | grep '\.c$' | sort | uniq -c | sort -rn

None of these tools is impressive alone. Together, in a pipeline, they handle most of the text-processing tasks you will encounter in a C development workflow.

Pipeline (Unix) - Wikipedia ↩

sort

uniq

wc

cut

tr

sed

Putting it together

Footnotes