| Programming | ![]() |
Lookaround operators have two components: a match pattern and a test pattern. If you call the match pattern p1 and the test pattern p2, then the simplest form of lookaround operator looks like this:
The match pattern p1 is just like any other element in an expression. For example, it can be '\<[A-Za-z]+\>' to make regexp find any word.
The test pattern p2 places a condition on this match. There can be a match for p1 only if there is also a match for p2, and the p2 match must immediately precede (for lookbehind operators) or follow (for lookahead operators) the match for p1.
In the following expression, the match pattern is '\<[A-Za-z]+\>' and the test pattern is '\S'. The entire expression can be read as "Find those words that are followed by a non-white-space character":
When used on the following string, this lookahead expression matches the letters of the words Raven and Nevermore:
str = 'Quoth the Raven, "Nevermore"'; regexp(str, '\<[A-Za-z]+\>(?=\S)', 'match') ans = 'Raven' 'Nevermore'
One important characteristic of lookaround operators is how they affect the parsing of the input string. The parser can be said to "consume" pieces of the string as it looks for matching phrases. With lookaround operators, only the match pattern p1 affects the current parsing location. Finding a match for the test pattern p2 does not move the parser location.
| Note You can also use lookaround operators to perform a logical AND of two elements. See Using Lookaround as a Logical Operator. |
This table shows the four lookaround expressions: lookahead, negative lookahead, lookbehind, and negative lookbehind.
Lookahead -- expr1(?=expr2)
Use p1(?=p2) to find all words of this string that precede a comma:
poestr = ['While I nodded, nearly napping, ' ... 'suddenly there came a tapping,']; [mat idx] = regexp(poestr, '\w*(?=,)', 'match', 'start') mat = 'nodded' 'napping' 'tapping' idx = 9 24 55
Negative Lookahead -- expr1(?!expr2)
Use p1(?!p2) to find all words that do not precede a comma:
[mat idx] = regexp(poestr, '\w+(?!\w*,)', 'match', 'start') mat = 'While' 'I' 'nearly' 'suddenly' 'there' 'came' 'a' idx = 1 7 17 33 42 48 53
Lookbehind -- (?<=expr1)expr2
Use (?<=p1)p2 to find all words that follow a comma and zero or more spaces:
Negative Lookbehind -- (?<!expr1)expr2
Use (?<!p1)p2 to find all words that do not follow a comma and zero or more spaces:
[mat idx] = regexp(poestr, '(?<!,\s*\w*)\w*', 'match', 'start') mat = 'While' 'I' 'nodded' 'napping' 'there' 'came' 'a' 'tapping' idx = 1 7 9 24 42 48 53 55
Using Lookaround as a Logical Operator
You can use lookaround operators to perform a logical AND, as shown in this example. The expression used here finds all words that contain a sequence of two letters under the condition that the two letters are identical and are in the range a through m. (The expression '(?=[a-m])' is a lookahead test for the range a through m, and the expression '(.)\1' tests for identical characters using a token):
[mat idx] = regexp(poestr, '\<\w*(?=[a-m])(.)\1\w*\>', ... 'match', 'start') mat = 'nodded' 'suddenly' idx = 9 33
Note that when using a lookahead operator to perform an AND, you need to place the match expression expr1 after the test expression expr2:
| Logical Operators | Quantifiers | ![]() |
© 1994-2005 The MathWorks, Inc.