Programming |
Lookaround operators have two components: a match pattern and a test pattern. If you call the match pattern p
1 and the test pattern p
2, then the simplest form of lookaround operator looks like this:
The match pattern p
1 is just like any other element in an expression. For example, it can be '\<[A-Za-z]+\>' to make regexp
find any word.
The test pattern p
2 places a condition on this match. There can be a match for p
1 only if there is also a match for p
2, and the p
2 match must immediately precede (for lookbehind operators) or follow (for lookahead operators) the match for p
1.
In the following expression, the match pattern is '
\<[A-Za-z]+\>'
and the test pattern is '
\S'
. The entire expression can be read as "Find those words that are followed by a non-white-space character":
When used on the following string, this lookahead expression matches the letters of the words Raven
and Nevermore
:
str = 'Quoth the Raven, "Nevermore"'; regexp(str, '\<[A-Za-z]+\>(?=\S)', 'match') ans = 'Raven' 'Nevermore'
One important characteristic of lookaround operators is how they affect the parsing of the input string. The parser can be said to "consume" pieces of the string as it looks for matching phrases. With lookaround operators, only the match pattern p
1 affects the current parsing location. Finding a match for the test pattern p
2 does not move the parser location.
Note You can also use lookaround operators to perform a logical AND of two elements. See Using Lookaround as a Logical Operator. |
This table shows the four lookaround expressions: lookahead, negative lookahead, lookbehind, and negative lookbehind.
Lookahead -- expr1(?=expr2)
Use p
1(?=p
2)
to find all words of this string that precede a comma:
poestr = ['While I nodded, nearly napping, ' ... 'suddenly there came a tapping,']; [mat idx] = regexp(poestr, '\w*(?=,)', 'match', 'start') mat = 'nodded' 'napping' 'tapping' idx = 9 24 55
Negative Lookahead -- expr1(?!expr2)
Use p
1(?!p
2)
to find all words that do not precede a comma:
[mat idx] = regexp(poestr, '\w+(?!\w*,)', 'match', 'start') mat = 'While' 'I' 'nearly' 'suddenly' 'there' 'came' 'a' idx = 1 7 17 33 42 48 53
Lookbehind -- (?<=expr1)expr2
Use (?<=p
1)p
2 to find all words that follow a comma and zero or more spaces:
Negative Lookbehind -- (?<!expr1)expr2
Use (?<!p
1)p
2 to find all words that do not follow a comma and zero or more spaces:
[mat idx] = regexp(poestr, '(?<!,\s*\w*)\w*', 'match', 'start') mat = 'While' 'I' 'nodded' 'napping' 'there' 'came' 'a' 'tapping' idx = 1 7 9 24 42 48 53 55
Using Lookaround as a Logical Operator
You can use lookaround operators to perform a logical AND, as shown in this example. The expression used here finds all words that contain a sequence of two letters under the condition that the two letters are identical and are in the range a
through m
. (The expression '(?=[a-m])'
is a lookahead test for the range a
through m
, and the expression '(.)\1
' tests for identical characters using a token):
[mat idx] = regexp(poestr, '\<\w*(?=[a-m])(.)\1\w*\>', ... 'match', 'start') mat = 'nodded' 'suddenly' idx = 9 33
Note that when using a lookahead operator to perform an AND, you need to place the match expression expr
1 after the test expression expr
2:
Logical Operators | Quantifiers |
© 1994-2005 The MathWorks, Inc.