Compare files and directories with Linux diff and comm commands

There are several ways to compare files and directories on Linux systems. The diff, colordiff and wdiff commands are just a sample of the commands you are likely to encounter. Another is communication. The (think “common”) command allows you to compare files in side-by-side columns with the contents of individual files.

Or difference gives you a display like this showing which lines are different and where the differences are, comm offers different options with a focus on common content. Let’s look at the default output and then some other features.

Here is diff output — showing the lines that are different in the two files and using the signs to indicate which file each line comes from.

$ diff whoison whoison-again
< who | awk '{print $1}' | sort | uniq
> who | awk '{print $1}' | sort | wc -l

If you’ve used the diff command a lot, you probably know that it can also display the contents of a file side by side. In the example below, we see that the line which is different between the two files is marked with a vertical bar preceding the line which is different.

$ diff -y whoison whoison-again
#!/bin/bash                                      #!/bin/bash
# show unique logins                             # show unique logins

echo hello, $USER                                echo hello, $USER
echo Look who is logged in!                      echo Look who is logged in!
echo ===========================                 echo ===========================
who | awk '{print $1}' | sort | uniq           | who | awk '{print $1}' | sort | wc -l
echo ===========================                 echo ===========================

The comm command shows differences in columns by default, but we have a little problem here:

$ comm whoison whoison-again
                # show unique logins

                echo hello, $USER
                echo Look who is logged in!
                echo ===========================
who | awk '{print $1}' | sort | uniq
comm: file 1 is not in sorted order             <=== Oops! The comm commands expects
echo ===========================                     sorted data
        who | awk '{print $1}' | sort | wc -l
comm: file 2 is not in sorted order
        echo ===========================

The errors shown in this output confirm an important restriction with diff – it requires the compared files to be in sorted order.

The output is very different from diff, but let’s take a look at what we see. In the diff output, we look at lines that are different in the two files. All other lines in both files are identical.

In the comm output we also see the contents of the two files in columns, but the key is the indentation. The rightmost column displays the content that is the same in both files – up to a point. The other two columns show (leftmost) the content that is unique to the first file and (in the middle) the content that is unique to the second file. But we also see another row (shown twice) in the first two columns and some complaints that the compared data is not sorted. This tells us something about how communication works. It expects to work with files that are in sorted order. It’s best to use diff when you want to compare scripts or other unsorted data.

unique to     unique to     common to
file 1        file 2        both files

Let’s say we are comparing lists of states in which two individuals have lived. In this example, the states that Eric lived in (and not Sandra) are displayed in the left column, while the states that Sandra lived in (and not Eric) appear in the middle column. The states they both lived in are in the far right column. In this case, the two status lists are in alphabetical order, so the comm command works as expected.

$ comm eric sandra
        New Jersey
        New York

Now suppose that you only want to see the states where Eric and Sandra both lived. It is easy for comm order. You just need to use the -12 option, which tells comm do not display what you would normally see in columns 1 and 2.

$ comm -12 eric sandra

Compare directories

The comm command can also be easily forced to show you the differences between the contents of two similar directories. After all, directory listings are by nature alphabetical. In the following example, we see that the dir1 and dir2 directories have 3 files in common and each has a single unique file.

$ comm <(ls dir1) <(ls dir2)

When you use comm to compare directory listings, you are only comparing filenames, do not the contents of the files. If you are comparing the contents of newly configured home directories, you can add the -a to show “dot files”.

$ comm ^lt;(ls -a mjw) <(ls -a pxg)

The same can be done with diff, but the output is a bit different – with signs identifying differences and no sign of common files.

$ diff <(ls -a user1) <(ls -a user2)
< .bash_history
< .bashrc
< .bashrc.orig
< bin
< mbox
> .cshrc
> .history
> .login
> .logout
< .ssh
< .vimrc
< .Xauthority

Did you notice what the comm command does when we use the

$ comm <(pwd) <(echo /home/justme)

It might not be the most insightful command you can run, but you can see how comm informs us that the pwd command and the echo command have the same result. The diff command would perform the same type of comparison, but give you no output by default – an indication that the output of both commands is the same.

$ diff <(pwd) <(echo /home/justme)

The comm command can provide a way to compare the output of two commands as easily as it can compare two files. Just make sure the data you are comparing is in alphabetical order if multiple rows of output are expected.

Join the Network World communities on Facebook and LinkedIn to comment on topics that matter to you.

Copyright © 2018 IDG Communications, Inc.

Comments are closed.