Programming Previous page   Next Page

Lookaround Operators

Lookaround operators have two components: a match pattern and a test pattern. If you call the match pattern p1 and the test pattern p2, then the simplest form of lookaround operator looks like this:

The match pattern p1 is just like any other element in an expression. For example, it can be '\<[A-Za-z]+\>' to make regexp find any word.

The test pattern p2 places a condition on this match. There can be a match for p1 only if there is also a match for p2, and the p2 match must immediately precede (for lookbehind operators) or follow (for lookahead operators) the match for p1.

In the following expression, the match pattern is '\<[A-Za-z]+\>' and the test pattern is '\S'. The entire expression can be read as "Find those words that are followed by a non-white-space character":

When used on the following string, this lookahead expression matches the letters of the words Raven and Nevermore:

One important characteristic of lookaround operators is how they affect the parsing of the input string. The parser can be said to "consume" pieces of the string as it looks for matching phrases. With lookaround operators, only the match pattern p1 affects the current parsing location. Finding a match for the test pattern p2 does not move the parser location.

This table shows the four lookaround expressions: lookahead, negative lookahead, lookbehind, and negative lookbehind.

Operator
Usage
expr1(?=expr2)
Match expression expr1 if followed by expression expr2.
expr1(?!expr2)
Match expression expr1 if not followed by expression expr2.
(?<=expr1)expr2
Match expression expr2 if preceded by expression expr1.
(?<!expr1)expr2
Match expression expr2 if not preceded by expression expr1.

Lookahead -- expr1(?=expr2)

Use p1(?=p2) to find all words of this string that precede a comma:

Negative Lookahead -- expr1(?!expr2)

Use p1(?!p2) to find all words that do not precede a comma:

Lookbehind -- (?<=expr1)expr2

Use (?<=p1)p2 to find all words that follow a comma and zero or more spaces:

Negative Lookbehind -- (?<!expr1)expr2

Use (?<!p1)p2 to find all words that do not follow a comma and zero or more spaces:

Using Lookaround as a Logical Operator

You can use lookaround operators to perform a logical AND, as shown in this example. The expression used here finds all words that contain a sequence of two letters under the condition that the two letters are identical and are in the range a through m. (The expression '(?=[a-m])' is a lookahead test for the range a through m, and the expression '(.)\1' tests for identical characters using a token):

Note that when using a lookahead operator to perform an AND, you need to place the match expression expr1 after the test expression expr2:


Previous page  Logical Operators Quantifiers Next page

© 1994-2005 The MathWorks, Inc.