Tips&Tricks

How to count words with grep

bash

Grep command in Unix/Linux is the short form of ‘global search for the regular expression’. The grep command is a filter that is used to search for lines matching a specified pattern and print the matching lines to standard output.

Piping to wc util is slow:

~ $ time grep and tmp/a/longfile.txt | wc -l
2811

real 0m0.097s
user 0m0.006s
sys 0m0.032s

– here we uses grep for searching “and” in the file tmp/a/longfile.txt. Next we use wc with key -l to calculate number of output lines. And the time util to track the duration of command.

Faster version. Without piping to wc, but here we use -c grep’s key:

~ $ time grep -c and tmp/a/longfile.txt 
2811

real 0m0.013s
user 0m0.006s
sys 0m0.005s

It counts strings not matches!

To count all matches use the following approach (each match in it’s own line):

~ $ grep -o and tmp/a/longfile.txt | wc -l 
3402

– different (correct) result with the same file.

One more example:

$ grep -oi you <$ And if you save yourself
$ You will make him happy
$ He'll keep you in a jar
$ And you'll think you're happy
$ He'll give you breathing holes
$ And you'll think you're happy
$ He'll cover you with grass
$ And you'll think you're happy now
$ SONG
12

And with -c key:

$ grep -ci you <$ And if you save yourself 
$ You will make him happy
$ He'll keep you in a jar
$ And you'll think you're happy
$ He'll give you breathing holes
$ And you'll think you're happy
$ He'll cover you with grass
$ And you'll think you're happy now
$ SONG
8