Essential Perl Regular Expression Syntax
Contents
Metacharacters
Code |
Meaning |
Example |
\ |
The 'escape' character. Put it before another meta character to match that metacharater or put it in front of a non-metacharater to match something else (see the following tables). |
/\\/ matches the backslash character, '\'. |
| |
The 'alternation' character. Put it between two strings to match either one. |
/Yo|Hello|Hola/ matches 'Yo', 'Hello' or 'Hola'. |
( and ) |
For grouping and memory. Put stuff between parentheses if it should be grouped together. Anything that matches the stuff put between parentheses can also be recalled with the scalars \1, \2... \n and $1, $2... $n depending on whether you are on the "matching" side side or the "substitution" side of a regular expression (see last example). To 'turn off' the memory, put '?:' after the first parentheses. |
/(Yo|Hello|Hola) Gary/ matches 'Yo Gary', 'Hello Gary' or 'Hola Gary'. Note, if a match was found, $1 now contains 'Yo', 'Hello' or 'Hola'. |
[ and ] |
For sets or classes of characters. You can imagine all of the characters inside the square-brackets as having an alternator (pipe) between them. It also allows for ranges of characters (see examples). |
/[abc]/ matches 'a', 'b' or 'c'. |
{ and } |
For specifying ranges of repetitions. |
/Yo{3}/ matches 'Yooo'. |
. |
Matches any single character except newline. |
/./ Matches any single character but a newline! |
* |
This matches zero or more of the preceding character or group. This is the same as {0,}. NOTE:When * is followed by ?, then the matching is done in 'non-greedy' mode. See example. |
/Yo*/ Matches 'Y' with any number of following 'o's (i.e. 'Y', 'Yo', 'Yoooooooo') |
+ |
This matches one or more of the preceding character or group. This is the same as {1,}. NOTE:When + is followed by ?, then the matching is done in 'non-greedy' mode. See example. |
/Yo+/ Matches 'Yo' with any number of following 'o's. |
? |
This matches zero or one of the preceding character or group. This is the same as {0,1}. |
/Yo?/ Matches 'Y' or 'Yo'. |
^ |
If placed at the start of an expression, it matches at the beginning of the string. If placed after an open- square-bracket, [, it means 'not the contents of the square-brackets'. Otherwise it just matches ^. |
/^abcd/ only matches if the string begins with 'abcd'. |
$ |
If placed at the end of an expression, it matches at the end of the string or just before the newline at the end of the string. Otherwise it will be interpreted as a scalar expression (unless it is escaped with a backslash '\'). Compare to \Z. |
/wxyz$/ only matches if the string ends with 'wxyz' or if 'wxyz' comes immediately before a newline. |
Class Codes
Code |
Matches |
Equivalent Class |
Example |
\d |
A digit |
[0-9] |
/\d\d/ Matches two digits (i.e. 87). |
\D |
A non-digit |
[^0-9] |
/\D\D/ Matches to non-digit (i.e. M#). |
\w |
A word (alphanumeric) character |
[a-zA-Z_0-9] |
/\w\w/ Matches two alphanumeric characters (i. e. a9) |
\W |
A non-word (non-alphanumeric) character |
[^a-zA-Z_0-9] |
/\W\W/ Matches two non-alphanumeric characters (i.e. &!) |
\s |
A whitespace character |
[ \t\n\r\f] |
/\s\w+/ Matches a single white space before one or more word characters. |
\S |
A non-whitespace character |
[^ \t\n\r\f] |
/\S\s+\S Matches a single non-whitespace character followed by one more whitespace characters followed by a single non-whitespace character. |
Anchor Codes
Code |
Meaning |
Example |
^ |
If placed at the start of an expression, it matches at the beginning of the string. If placed after an open- square-bracket, [, it means 'not the contents of the square-brackets'. Otherwise it just matches '^'. |
/^abcd/ only matches if the string begins with 'abcd'. |
$ |
If placed at the end of an expression, it matches at the end of the string or just before the newline at the end of the string. Otherwise it will be interpreted as a scalar expression (unless it is escaped with a backslash '\'). Compare to \Z. |
/wxyz$/ only matches if the string ends with 'wxyz' or if 'wxyz' comes immediately before a newline. |
\b |
Matches at a word boundary. |
/Yo\b/ Matches 'Yo' but not 'Yo-yo'. |
\B |
Matches everything except at a word boundary. |
/reg\B/ Matches 'reggie' or 'regexp' but not 'reg'. |
\A |
Matches at the start of a string. |
/\Aabcd/ only matches if the string begins with 'abcd'. |
\Z |
Matches at the end of the string. Compare to $. |
/wxyz\Z/ only matches if the string ends with 'wxyz'. |
(?=...) |
This is a "look ahead" thing and always comes after some other part of the regular expression. It means, match the stuff before me only if it is followed by "...". |
|
(?!...) |
This is a "look ahead" thing and always comes after some other part of the regular expression. It means, match the stuff before me only if it is not followed by "...". |
|
Modifiers
Code |
Meaning |
Example |
g |
Match Globally. Find all of the occurrences in the string (not just the first one). |
//g |
i |
Do case insensitive pattern matching. |
//i |
m |
Treat string as multiple lines. |
//m |
o |
Only compile the pattern once. |
//o |
s |
Treat string as a single line. |
//s |
x |
Use extended regular expressions. |
//x |
e |
Evaluate the right side as an expressions. |
//e |
Random Stuff
Code |
Meaning |
Example |
m |
Start your regular expression syntax with m and you can set the delimiters. |
m#Yo-yo# Matches 'Yo-yo'. |
\x |
Followed by one or two hexadecimal digits, \x will match the character having that hexadecimal number. See the ascii set. |
/\xd/ Matches a carriage return character. |
\0 |
Followed by one, two or three octal digits, \0 will match the character having that octal number. See the ascii set. |
/\012/ Matches a line feed character. |
\c |
Followed by a single character, \c will match the corresponding control character. |
/\cD/ Matches a Control-D character. |
\number |
If "number" is single digit or it does not start with '0' (zero), then it matches whatever the corresponding parentheses matched. |
/(Hello) \1/ Matches "Hello Hello". |
Originally written by Josh Starmer.