You'll get your Mac news here from now on...

Help TMO Grow

Subscriber Login

Advertising Info


The Mac Observer Express Daily Newsletter


More Info

Site Navigation

Home
News
Tips
Columns & Editorials
Reviews
Reports
Archives
Search
Forums
Links
Mac Links
Software
Reports
Contact

Mac OS X Command Line 101
by Richard Burton


Understanding The "grep" Command In Mac OS X
Part XV of this series...
October 4th, 2002

I don't know why Dudley keeps trying to find himself, I found him years ago.
-
Peter Cook

This series is designed to help you learn more about the Mac OS X command line. If you have any questions about what you read here, check out the earlier columns, write back in the comments below, or join us in the Hardcore X! forum.

In the previous column, we learned about regular expressions, and how to use them to search for text in vi. Having such a text-searching tool for the command line would be a valuable addition to Unix; naturally, such a tool exists. It is called grep, and it is the subject of today's column.

grep allows you to search through your entire system, for either the name of a file, or for content within those files. This is similar to the way Sherlock used to work before Sherlock 3, and the way "Find" works today in Jaguar's GUI. When you need to find a string of text on your system from the command line, grep is the way to do it. Now, on to how to use it.

The grep command will take a regular expression, as well as a list of files. It will then search through the files and, for each line that is matched by the regular expression, print the line. (Supposedly, the name grep comes from ed command g/RE/p, or "global/regular expression/print", which does the same thing within the editor. I can neither confirm nor deny this.) If there are no files indicated, grep will read from standard input. Therefore, you can do things like:

    [localhost:~] dr_unix% ls
    Adam.txt   Library    Pictures   login      test_1.txt
    Desktop    Movies     Public     personal   testfile
    Documents  Music      Sites      temp.html  who_list
    [localhost:~] dr_unix% ls | grep ".es.*"
    Desktop
    Movies
    Pictures
    Sites
    test_1.txt
    testfile
    [localhost:~] dr_unix% 
    

to give a more flexible search. Notice that the regular expression, .es.*, was enclosed in double quotes. Otherwise, we get this:

    [localhost:~] dr_unix% ls | grep .es.*
    grep: No match.
    [localhost:~] dr_unix% 
    
[Note: I think that this is because the asterisk and/or period will confuse the tcsh command line, which tries to use them as metacharacers, so you need the quotes. On the other hand, if you want to anchor the regular expression to the end of a line with a dollar sign, it interprets this as a variable $" and chokes. tcsh is quirky with regular expressions, and I haven't quite figured out everything with it. I know from experience that the Korn shell, ksh, does not suffer from this. On the other hand, ksh is not the default shell, so there y'are.]

You also need quotes if you have spaces in the regular expression. The difference between grep the  file and grep "the " file is that the former will match any occurrence of t-h-e, whereas the latter will match only for t-h-e-space. This means that the former will match "I was there" but the latter will not. Remember that the command line ignores extra spaces, collapsing many into one, unless the spaces are quoted.

As you might expect, grep takes the standard regular expression characters of ., *, ^, $, \, and [ ]. Thus, to count the number of blank lines in a file, do:

    [localhost:~] dr_unix% grep ^$ testfile
    
    
    
    [localhost:~] dr_unix% grep ^$ testfile | wc -l
           3 
    [localhost:~] dr_unix% 
    

Thus, we can see that grep ^$ testfile will print all three blank lines. We can use wc and the pipe, |, to build our own tool to count blank lines. Neat, huh?

In some Unixes (Unices?), there were two versions of grep, grep and egrep, whose primary difference was that each had slightly different additions to the basic regular expression syntax. In Darwin, and therefore in OS X, the syntaxes (syntaces?) are combined, and using either command will get you the same as using the other. Thus, you can bounce back and forth between them like so many yo-yos (yo-yi?)[*]

One set of regular expression characters available in grep is the \{ \} pair. This allows you to search for a range of occurrences. Suppose you want to look for "to", followed by three to nine characters, follow by an "a". This can be done by:

    [localhost:~] dr_unix% grep "to.\{3,9\}a" testfile
    come to the aid of the party.
    [localhost:~] dr_unix% 
    
Again, the quotes are needed here. If you want to match exactly 3, the regular expression is to.\{3\}a. Normally, the \{ \} pair is only available in grep, but in Darwin and OS X, it is also available in egrep.

grep's regular expression syntax is expanded in OS X to include features not seen in the standard definition of grep. In other words, OS X's grep will let you do searches that greps on other Unices won't. For example, you can use the \< \> pair to denote the beginnings and endings of words, just like in vi.

We have seen that the asterisk (*) is used to denote "any number of the thing preceding me." In OS X's grep, the plus sign, +, can be used to denote "at least on of the thing preceding me." So, while the regular expression th*e will match te, the, thhe, ..., the regular expression th+e will match the, thhe, thhhe, .... So can see that h+ is the same as hh*. The plus sign is often used in other utilities' regular expressions, but is not part of grep on most other systems. Make a note of it, there will be a quiz later.

Another bonus freebie that is thrown our way is the question mark, ?, unless you are British and over 35, in which case it is "a mark of interrogation." grep uses this in regular expressions to denote "zero or one occurrence of the thing before me", or "an optional [whatever is before me]." Therefore, the expression lie?d will match either lied or lid.

Finally, the vertical bar, |, can be used for either/or matching, just like in, you guessed it, vi.

grep can take several options; you can see them all via man grep, of course, but I've found that the most useful ones are (remember that this works in the grep option format):

-c: "count the lines". Instead of printing all the matched lines, -c merely prints a count of matched lines for each file. Thus that "| wc -l" trick isn't needed for one file. (If you pass in a list of files, though, ...)

-e PATTERN: "expression starts here." Using -e will tell grep "What follows is the pattern with which to search." This is very useful when your pattern starts with a '-'. Otherwise, the command line might think that your expression is an option and get confused.

-f FILE: "file holds the expression". -f allows you to store a pattern in a file and tell grep "Yo, use this." I've mostly used this when writing scripts that will use the same pattern repeatedly. That way, if I have to change it later, I only have to change it in one place.

-i: "ignore case". -i forces grep to ignore the distinction between uppercase and lowercase. Imagine you need to find matches in a file which may have come from Windows (include shudder here). Now imagine a long string of paired letters like [Tt][Hh][Ee] [Cc][Aa][Tt] and on and on. Just use -i instead and save yourself time and pain.

-l: "list files". Instead of printing the matched lines,when you use the -l option, grep will just print a list of the files which contain the expression. This is mostly used when you are doing something like grep "expression" * in a directory with a lot of files or when you just want to know which files need (processing, editing, etc).

-n: "number". -n means that before each line of output, grep will print its line number within the file.

-v: "invert". -v instructs grep to print only those lines that don't match the expression.


As you can see, grep is a very powerful tool. It can be used to quickly search files and to filter output on the command line. It does have a couple limitations, though. First, it is no speed demon. Building those regular expressions and parsing a lot of text in a flexible way takes resources, and that takes time. (Admittedly, these days, that isn't much of an issue, but still, there it is.) Second, consider the following: you are working away, happy as a clam, and the boss says "Cyprian", if your name is Cyprian, "I just got a call from marketing, we need to change the search in all those voodoo scripts you wrote, and we need it in ten minutes."

Now, you know and I know that you can look for the expression the\.?.*ca[r]?t and search for it using \ after \ after \. But my lord, and your duke for that matter, who the heck would want to? Do you realize that you would look for the\\\.\?\.\*ca\[r\]\?t (or something along those lines) and heaven forbid you should make the slightest mistake. If you're like me, and I know I am, you'd think "Now dash it, there must be an easier way. Surely, in all the history of Unix, someone has had to face just such an emergency and written a grep-like tool to deal with this. Like that Cyprian chap, maybe." Well, Cyprian has come through. It's called fgrep (for "fast grep"), and it works a lot like grep except it doesn't take a regular expression.

    fgrep [options] string [files]
    

Where you would normally place a regular expression, just put in a literal string. Originally it was used to be a fast alternative to grep by trading the power and flexibility of regular expressions for speed. As quick as computers are these days, that isn't an issue, but if you want to find something that contains a literal period or a literal asterisk, it's the bee's knees.


[*] This joke was borrowed at great embarrassment from Shelley Berman. All young whippersnappers are advised to ask their parents or grandparents.

You are encouraged to send Richard your comments, or to post them below.


Most Recent Mac OS X Command Line 101 Columns

Command Line History & Editing Your Commands
November 22nd

Pico: An Easy To Use Command Line Editor
November 1st

Understanding The "grep" Command In Mac OS X
October 4th

Command Line History & Editing Your Commands
September 6th

Mac OS X Command Line 101 Archives

Back to The Mac Observer For More Mac News!


Richard Burton is a longtime Unix programmer and a handsome brute. He spends his spare time yelling at the television during Colts and Pacers games, writing politically incorrect short stories, and trying to shoot the neighbor's cat (not really) nesting in his garage. He can be seen running roughshod over the TMO forums under the alias tbone1.



Today's Mac Headlines

[Podcast]Podcast - Apple Weekly Report #135: Apple Lawsuits, Banned iPhone Ad, Green MacBook Ad

We also offer Today's News On One Page!

Yesterday's News

 

[Podcast]Podcast - Mac Geek Gab #178: Batch Permission Changes, Encrypting Follow-up, Re-Enabling AirPort, and GigE speeds

We also offer Yesterday's News On One Page!

Mac Products Guide
New Arrivals
New and updated products added to the Guide.

Hot Deals
Great prices on hot selling Mac products from your favorite Macintosh resellers.

Special Offers
Promotions and offers direct from Macintosh developers and magazines.

Software
Browse the software section for over 17,000 Macintosh applications and software titles.

Hardware
Over 4,000 peripherals and accessories such as cameras, printers, scanners, keyboards, mice and more.

© All information presented on this site is copyrighted by The Mac Observer except where otherwise noted. No portion of this site may be copied without express written consent. Other sites are invited to link to any aspect of this site provided that all content is presented in its original form and is not placed within another .