Essential Perl Regular Expression Syntax

Metacharacters

Code

Meaning

Example

\

The 'escape' character. Put it before another meta character to match that metacharater or put it in front of a non-metacharater to match something else (see the following tables).

/\\/ matches the backslash character, '\'.
/\$/ matches the dollar-sign character, '$'.
/\w/ matches a single alphanumeric character.

|

The 'alternation' character. Put it between two strings to match either one.

/Yo|Hello|Hola/ matches 'Yo', 'Hello' or 'Hola'.

( and )

For grouping and memory. Put stuff between parentheses if it should be grouped together. Anything that matches the stuff put between parentheses can also be recalled with the scalars \1, \2... \n and $1, $2...      $n depending on whether you are on the "matching" side side or the "substitution" side of a regular expression (see last example). To 'turn off' the memory, put '?:' after the first parentheses.

/(Yo|Hello|Hola) Gary/ matches 'Yo Gary', 'Hello Gary' or 'Hola Gary'. Note, if a match was found, $1 now contains 'Yo', 'Hello' or 'Hola'.
/(?:  Yo|Hello|Hola) Gary/ matches the same as above but does not store anything in $1.
s/([a-z])\1/$1/g matches any matching pair of lowercase alphabetical characters and replaces the two characters with only one occurance of that character.

[ and ]

For sets or classes of characters. You can imagine all of the characters inside the square-brackets as having an alternator (pipe) between them. It also allows for ranges of characters (see examples).

/[abc]/ matches 'a', 'b' or 'c'.
/[a-z]/ matches any lowercase character between 'a' and 'z'.

{ and }

For specifying ranges of repetitions.

/Yo{3}/ matches 'Yooo'.
/Yo{3,}/ matches 'Yo' with 3 or more 'o's.
/Yo{3,5}/ matches 'Yo' with 3, 4 or 5 'o's.
/(?:Yo){3}/ matches 'YoYoYo'.

.

Matches any single character except newline.

/./ Matches any single character but a newline!

*

This matches zero or more of the preceding character or group. This is the same as {0,}. NOTE:When *    is followed by ?, then the matching is done in 'non-greedy' mode. See example.

/Yo*/ Matches 'Y' with any number of following 'o's (i.e. 'Y', 'Yo', 'Yoooooooo')
/Y.*/ Matches 'Y' with any number of anything except newlines.
/Y.*?Yo/ Same as above but, the dot. stops matching as soon as it gets to 'Yo'.

+

This matches one or more of the preceding character or group. This is the same as {1,}. NOTE:When + is followed by ?, then the matching is done in 'non-greedy' mode. See example.

/Yo+/ Matches 'Yo' with any number of following 'o's.
/Y.+/ Matches 'Y' with any number greater than one of anything except newlines.
/Y.+?Yo/ Same as above but, the dot. stops matching as soon as it gets to 'Yo'.

?

This matches zero or one of the preceding character or group. This is the same as {0,1}.

/Yo?/ Matches 'Y' or 'Yo'.

^

If placed at the start of an expression, it matches at the beginning of the string. If placed after an open- square-bracket, [, it means 'not the contents of the square-brackets'. Otherwise it just matches ^.

/^abcd/ only matches if the string begins with 'abcd'.
/[^a]/ matches 'not a'.
/la^la/ matches 'la^la'.

$

If placed at the end of an expression, it matches at the end of the string or just before the newline at the end of the string. Otherwise it will be interpreted as a scalar expression (unless it is escaped with a backslash '\'). Compare to \Z.

/wxyz$/ only matches if the string ends with 'wxyz' or if 'wxyz' comes immediately before a newline.

Class Codes

Code

Matches

Equivalent Class

Example

\d

A digit

[0-9]

/\d\d/ Matches two digits (i.e. 87).

\D

A non-digit

[^0-9]

/\D\D/ Matches to non-digit (i.e. M#).

\w

A word (alphanumeric) character

[a-zA-Z_0-9]

/\w\w/ Matches two alphanumeric characters (i. e. a9)

\W

A non-word (non-alphanumeric) character

[^a-zA-Z_0-9]

/\W\W/ Matches two non-alphanumeric characters (i.e. &!)

\s

A whitespace character

[ \t\n\r\f]

/\s\w+/ Matches a single white space before one or more word characters.

\S

A non-whitespace character

[^ \t\n\r\f]

/\S\s+\S Matches a single non-whitespace character followed by one more whitespace characters followed by a single non-whitespace character.

Anchor Codes

Code

Meaning

Example

^

If placed at the start of an expression, it matches at the beginning of the string. If placed after an open- square-bracket, [, it means 'not the contents of the square-brackets'. Otherwise it just matches '^'.

/^abcd/ only matches if the string begins with 'abcd'.
/[^a]/ matches 'not a'.
/la^la/ matches 'la^la'.

$

If placed at the end of an expression, it matches at the end of the string or just before the newline at the end of the string. Otherwise it will be interpreted as a scalar expression (unless it is escaped with a backslash '\'). Compare to \Z.

/wxyz$/ only matches if the string ends with 'wxyz' or if 'wxyz' comes immediately before a newline.

\b

Matches at a word boundary.

/Yo\b/ Matches 'Yo' but not 'Yo-yo'.

\B

Matches everything except at a word boundary.

/reg\B/ Matches 'reggie' or 'regexp' but not 'reg'.

\A

Matches at the start of a string.

/\Aabcd/ only matches if the string begins with 'abcd'.

\Z

Matches at the end of the string. Compare to $.

/wxyz\Z/ only matches if the string ends with 'wxyz'.

(?=...)

This is a "look ahead" thing and always comes after some other part of the regular expression. It means, match the stuff before me only if it is followed by "...".

(?!...)

This is a "look ahead" thing and always comes after some other part of the regular expression. It means, match the stuff before me only if it is not followed by "...".

Modifiers

Code

Meaning

Example

g

Match Globally. Find all of the occurrences in the string (not just the first one).

//g

i

Do case insensitive pattern matching.

//i

m

Treat string as multiple lines.

//m

o

Only compile the pattern once.

//o

s

Treat string as a single line.

//s

x

Use extended regular expressions.

//x

e

Evaluate the right side as an expressions.

//e

Random Stuff

Code

Meaning

Example

m

Start your regular expression syntax with m and you can set the delimiters.

m#Yo-yo# Matches 'Yo-yo'.

\x

Followed by one or two hexadecimal digits, \x will match the character having that hexadecimal number. See the ascii set.

/\xd/ Matches a carriage return character.
/\x2a/ Matches an asterisk character.

\0

Followed by one, two or three octal digits, \0 will match the character having that octal number. See the ascii set.

/\012/ Matches a line feed character.
/\044/ Matches a dollar sign character.

\c

Followed by a single character, \c will match the corresponding control character.

/\cD/ Matches a Control-D character.
/\cG/ Matches a Control-G character.

\number

If "number" is single digit or it does not start with '0' (zero), then it matches whatever the corresponding parentheses matched.

/(Hello) \1/ Matches "Hello Hello".
/(Yo*?) \1/ Matches two equal strings separated by a space that start with "Yo" followed by any number of 'o's.

Originally written by Josh Starmer.