Parentheses used in a regular expression not only group elements of that expression together, but also designate any matches found for that group as tokens. You can use tokens to match other parts of the same string. One advantage of using tokens is that they remember what they matched, so you can recall and reuse matched text in the process of searching or replacing.
This section covers
Introduction to Using Tokens
You can turn any pattern being matched into a token by enclosing the pattern in parentheses within the expression. For example, to create a token for a dollar amount, you could use '
(\$\d+)'. Each token in the expression is assigned a number from 1 to 255 going from left to right. To make a reference to a token later in the expression, refer to it using a backslash followed by the token number. For example, when referencing a token generated by the third set of parentheses in the expression, use
As a simple example, if you wanted to search for identical sequential letters in a string, you could capture the first letter as a token and then search for a matching character immediately afterwards. In the expression shown below, the
(\S) phrase creates a token whenever
regexp matches any non-white-space character in the string. The second part of the expression,
'\1', looks for a second instance of the same character immediately following the first:
The tokens returned in cell array
Starting and ending indices for each token in the input string
Using the token Parameter
You can have
regexpi return the actual tokens rather than token indices by specifying the optional
'token' parameter in the command. The following example is the same as the one above, except that it returns the text of the tokens found by the pattern
Operators Used with Tokens
Here are the operators you can use with tokens in MATLAB.
||Capture in a token all characters matched by the expression within the parentheses.
||Insert the match for the
||Capture in a token all characters matched by the expression within the parentheses. Assign a
||Match the token referred to by
Using Tokens -- Example 1
Here is an example of how tokens are assigned values. Suppose that you are going to search the following text:
You choose to search the above text with the following search pattern:
This pattern has three parenthetical expressions that generate tokens. When you finally perform the search, the following tokens are generated for each match.
Only the highest level parentheses are used. For example, if the search pattern
and(y|rew) finds the text
andrew, token 1 is assigned the value
rew. However, if the search pattern
(and(y|rew)) is used, token 1 is assigned the value
Using Tokens -- Example 2
\N to capture pairs of matching HTML tags (e.g.,
<\a>) and the text between them. The expression used for this example is
The first part of the expression, '<(\w+)', matches an opening bracket (
<) followed by one or more alphabetic, numeric, or underscore characters. The enclosing parentheses capture token characters following the opening bracket.
The second part of the expression, '.*?>.*?', matches the remainder of this HTML tag (characters up to the
>), and any characters that may precede the next opening bracket.
The last part,
', matches all characters in the ending HTML tag. This tag is composed of the sequence
tag is whatever characters were captured as a token.
Using Tokens in a Replacement String
When using tokens in a replacement string, reference them using
$2, etc. instead of
\2, etc. This example captures two tokens and reverses their order. The first,
'Norma Jean' and the second,
'Baker'. Note that
regexprep returns the modified string, not a vector of starting indices, by default:
Named Capture -- (?<name>expr)
If you use a lot of tokens in your expressions, it may be helpful to assign them names rather than having to keep track of which token number is assigned to which token. Use the operator
(?<name>expr) to assign
name to the token matching expression
When referencing a named token within the expression, use the syntax
\k<name> instead of the numeric
Conditional Expressions -- (?(token)expr1|expr2)
With conditional regular expressions, you can select which pattern to match, depending on whether a token elsewhere in the string is found. The expression appears as
This expression can be translated as an if-then-else statement, as follows:
The next example uses the conditional expression
expr to match the string regardless of the gender used. The expression creates a token if
Mr is followed by the letter
s. It later matches either
his, depending on whether this token was found. The phrase
(?(1)her|his) means that if token
1 is found, then match
her, else match
In the second part of the example, the token
s is found and MATLAB matches the word
|Quantifiers||Handling Multiple Strings|
© 1994-2005 The MathWorks, Inc.