These are the main regular expression characters that you should know.
Character classes | |
| Character classes provide a way to specify a set of characters. The set can be explicitly enclosed in []. The set can also be expressed by what must not be in it by beginning the set with a caret, "^". There are a number of predefined sets (eg, \d, \s, etc). The minus, "-", can be used to indicate a range of character values. Altho a character class matches only one character, a quantifier following it can be used to match multiple characters. | |
| [abc] | a, b, or c (simple class) |
| [^abc] | Any character except a, b, or c (negation) |
| [a-zA-Z] | a through z or A through Z, inclusive (range) |
| Predefined character classes | |
|---|---|
| . | Any character (may or may not match line terminators) |
| \d | A digit: [0-9] |
| \D | A non-digit: [^0-9] |
| \s | A whitespace character: [ \t\n\x0B\f\r] |
| \S | A non-whitespace character: [^\s] |
| \w | A word character: [a-zA-Z_0-9] |
| \W | A non-word character: [^\w] |
Quantifiers (repeating the previous element) | |
| Greedy quantifiers - Expand as much as possible | |
| X? | X, once or not at all |
| X* | X, zero or more times |
| X+ | X, one or more times |
| X{n} | X, exactly n times |
| X{n,} | X, at least n times |
| X{n,m} | X, at least n but not more than m times |
| Reluctant quantifiers - Expand only if forced by later failure to match | |
| X?? | X, once or not at all |
| X*? | X, zero or more times |
| X+? | X, one or more times |
| X{n}? | X, exactly n times |
| X{n,}? | X, at least n times |
| X{n,m}? | X, at least n but not more than m times |
Boundary matchers - Zero-width matches. | |
| ^ | The beginning of a line. Very useful. |
| $ | The end of a line. Very userful. ^$ matches all emtpy lines. |
| \b | A word boundary |
| \B | A non-word boundary |
| \A | The beginning of the input |
| \G | The end of the previous match |
| \Z | The end of the input but for the final terminator, if any |
| \z | The end of the input |
Other | |
| Logical operators | |
| XY | X followed by Y |
| X|Y | Either X or Y |
| Grouping - Parentheses both group and create a numbered element that can be used later. | |
| (X) | X. This capturing group is remembered so it can be referenced later. Numbered starting at 1. |
| Quotation | |
| \ | Nothing, but quotes the following character. |
| Characters | |
| x | The character x |
| \\ | The backslash character |
| \t | The tab character ('\u0009') |
| \n | The newline (line feed) character ('\u000A') |
| \r | The carriage-return character ('\u000D') |
| \f | The form-feed character ('\u000C') |