Home


Regular Expressions

We will go through a number of examples on regular expressions as used by Perl. This doesn't cover all of the cases, as the books really do a better job than I can.

  • Matching a constant string.
    This is the simplest use of pattern matching. About all that you need to watch out for is to avoid using special characters such as parentheses, braces, and the backslash character. This means that the following works the way you would like:
    	if ($text =~ /Hello/) {
    		print "I saw a hello!\n";
    	}
    

  • Requiring the pattern be at the start or end of the text.
    Sometimes you need to require that the pattern be at the start or end of the line. The ^ matches the start of the line, and $ matches the end of the line. So:
    	if ($text =~ /^Hello$/) {
    		print "I saw a hello!\n";
    	}
    
    will only match if there are no other characters in $text other than Hello.

  • Whitespace.
    For many patterns, it is okay if there is whitespace before or after the pattern of interest. In some cases the whitespace is required, in other cases the whitespace may simply be allowed. The sequence \s matches exactly one whitespace character (mostly a space or a tab). If you want to match one or more whitespace character, you can use \s+ where the + indicates one or more of the previous type of character. If the whitespace is optional, you want to allow zero or more, which is what the *character does. So the pattern becomes \s*.
    	if ($text =~ /^\s+Hello\s+$/) {
    		print "I saw a hello!\n";
    	}
    
    Now the pattern will match if the string contains exactly Hello, but allowing leading or trailing whitespace.

  • Matching any of a group of characters.
    Sometimes you will want to match any of a set of characters. For instance, if we wanted to allow either an upper or lower case H in our overused example, we can use the pattern [hH]. This allows any character in the set to match. The pattern [0-9] matches any single digit (which is used so often that Perl allows \d as a synonym).
    	if ($text =~ /^\s+[hH]ello\s+$/) {
    		print "I saw a hello!\n";
    	}
    
    As should be coming clear, regular expressions can match according to some fairly complex criteria, but may also require a bit of study to get right (or to read somebody elses pattern).

  • Other special sequences.
    There are a number of special characters.
    • . Matches any single character (other than newline)
    • \b Matches on a word boundary
    • \B Matches on a non-word boundary
    • \n Matches a newline
    • \r Matches a carriage return
    • \t Matches a tab
    • \f Matches a formfeed
    • \d Matches a digit [0-9]
    • \D Matches a non-digit [^0-9]
    • \w Matches a word character [0-9a-z_A-Z]
    • \W Matches a non-word character [^0-9a-z_A-Z]
    • \s Matches a whitespace character [\t\n\r\f]
    • \S Matches a non-whitespace character [^\t\n\r\f]

There are a number of other pattern matching sequences. Some of those sequences are important to solve certain problems in an easy fashion. But the ones discussed above should address many simple programs.

Again, regular expressions are quite useful, and will help your coding efforts if understood. They are also somewhat hard to read, especially if they are complicated. However, before you write regular expressions off as too hard to bother with, consider how you would have to do pattern matching code in any conventional programming language.


PEAK


Last modified 27 May 2006
Dave Regan
http://www.peak.org/~regan/
Resume / Biography