RegExp in JavaScript
Regular expressions are patterns used to match character combinations in strings.
Constructing a RegExp
var re = /ab+c/i;
var re = new RegExp("ab+c", "i");
Lookahead
x(?=y)
(Positive Lookahead)Matches
'x'
only if'x'
is followed by'y'
.For example,
/Jack(?=Sprat)/
matches'Jack'
only if it is followed by'Sprat'
./Jack(?=Sprat|Frost)/
matches'Jack'
only if it is followed by'Sprat'
or'Frost'
. However, neither ‘Sprat’ nor ‘Frost’ is part of the match results.
x(?!y)
(Negative Lookahead)Matches
'x'
only if'x'
is not followed by'y'
.For example,
/\d+(?!\.)/
matches a number only if it is not followed by a decimal point. The regular expression/\d+(?!\.)/.exec("3.141")
matches'141'
but not'3.141'
.
Lookahead does not consume characters in the string. It only asserts whether a match is possible or not.
As soon as the lookaround condition is satisfied, the regex engine forgets about everything inside the lookaround.
// Intersaction
// Match a 6+ letter password with at least:
// one number, one letter, and one symbol
var re = /^(?=.*\d)(?=.*[a-z])(?=.*[\W_]).{6,}$/i;
// Subtraction
// Any number that's NOT divisible by 5
var re = /\b(?!\d+[05])\d+\b/;
// Negation
// Anything that doesn't contain 'foo'
var re = /^(?!.*foo).+$/;
Lookbehind
Lookbehind works backwards. It tells the regex engine to temporarily check backwards in the string.
(?<=y)x
(Positive Lookbehind)Matches
'x'
only if'x'
is preceded by'y'
.
(?<!y)x
(Negative Lookbehind)Matches
'x'
only if'x'
is not preceded by'y'
.For example,
(?<!\\)(\\\\)*\\$
matches odd numbers of consecutive\
.
Note: Lookbehind is not available in JavaScript.
Backreferences
(x)
Matches
'x'
and remembers the match. The parentheses are called capturing parentheses.
(?:x)
Matches
'x'
but does not remember the match. The parentheses are called non-capturing parentheses.In expression
/(?:foo){1,2}/
. Without the non-capturing parentheses, the{1,2}
characters would apply only to the last'o'
in'foo'
. With the capturing parentheses, the{1,2}
applies to the entire word'foo'
.
Including parentheses in a regular expression pattern causes the corresponding submatch to be remembered.
// match quote string "hello"
/('|").+?('|")/g.exec('"hello"');
// but false positive: "hello' or 'hello"
/('|").+?('|")/g.exec('"hello\'');
// better: match quote string "hello 'world'"
/('|").+?\1/g.exec('"hello \'world\'"');
// but false positive: "hello \"world\""
/('|").+?\1/g.exec('"hello "world""');
// best: can match quote string "hello \"world\""
/('|")(\\?.)*?\1/g.exec('"hello \\"world\\""');
Furthermore, in string#replace
method. the script can uses $1
and $2
to denote the first and second parenthesized substring matches.
"John Smith".replace(/(\w+)\s(\w+)/, "$2, $1");
// => "Smith, John"
Special Characters
?
If used immediately after any of the quantifiers *
, +
, ?
, or {}
,
makes the quantifier non-greedy.
For example, applying /\d+/
to "123abc"
matches "123"
.
But applying /\d+?/
to that same string matches only the "1"
.
\b
Matches a word boundary. A word boundary matches the position where a word character is not followed or preceeded by another word-character. Note that a matched word boundary is not included in the match.
Examples:
/\bm/
matches the'm'
in"moon"
;/oo\b/
does not match the'oo'
in"moon"
, because'oo'
is followed by'n'
which is a word character;/oon\b/
matches the'oon'
in"moon"
, because'oon'
is the end of the string, thus not followed by a word character;/\w\b\w/
will never match anything, because a word character can never be followed by both a non-word and a word character.