Arrange text with sort on Linux


Welcome to the next pikoTutorial!

Basic usage

At its most basic, the sort command sorts lines in a file alphabetically. For example, if you have a file named data.txt, you can print its sorted content with:

sort data.txt

Of course, such console output can be redirected to a file:

sort data.txt > sorted_data.txt

Numerical sorting

Instead of performing the default alphabetical sorting, by adding the -n option, you can sort the file content numerically:

sort -n data.txt

Note for advanced: maybe you noticed that if you have a file with 3 lines e.g. “3 1 2”, the default sort (without -n option) will output lines in the correct order “1 2 3”, so what’s the point of adding -n option? The difference shows up when you work with files containing mixed letters and numbers. Let’s say there is a file with lines “1 a 2 b”. The default sort will sort it to alphabetical order “1 2 a b” and sort with -n option will output “a b 1 2”. Why does the numerical sort puts letters first? The answer lies in how sort treats non-numeric character – it assumes their value to be 0 and 0 goes in front of both 1 and 2.

Reverse order

To reverse the order of the sort, call:

sort -r data.txt

Removing duplicates

sort also provides an option for basing filtering the output by removing duplicates:

sort -u data.txt

Sorting by column

Often there are files which consist of multiple columns:

5 2 7
2 8 5
1 7 4

If the columns are separated with white space, we can sort such file not only by the first character, but also by any given column. To sort it by the second column, call:

sort -k 2 data.txt

This will output:

5 2 7
1 7 4
2 8 5

If your file contains data separated by some other delimiter, e.g. comma, you must specify this delimiter explicitly with -t option:

sort -t, -k 2 data.txt

Note for beginners: remember that not all characters can be used directly in the command line. If your delimiter is e.g. a semicolon, you must provide it as -t";".

Randomizing lines

As it turns out, sort allows not only for sorting, but also for the opposite – for randomizing the lines in the given file:

sort -R data.txt

Don’t confuse it with -r option which stands for “reverse”.