Programming |
You can use quantifiers to specify how many instances of an element are to be matched. The first six rows of this table show the basic quantifiers. When used alone, they match as much of the string as possible. Thus these are sometimes called greedy quantifiers.
When one of these quantifiers is followed by a plus sign (e.g., '\w*+'
), it is known as a possessive quantifier. Possessive quantifiers also match as much of the string as possible, but they do not rescan any portions of the string should the initial match fail.
When you follow a quantifier with a question mark (e.g., '\w*?'
), it is known as a lazy quantifier. Lazy quantifiers match as little of the string as possible.
See the examples for each quantifier and quantifier type following the table.
Zero or One -- expr?
Use ?
to make the HTML <code>
and </code>
tags optional in the string. The first string, hstr1
, contains one occurrence of each tag. Since the expression uses ()?
around the tags, one occurrence is a match:
hstr1 = '<td><a name="18854"></a><code>%%</code><br></td>'; expr = '</a>(<code>)?..(</code>)?<br>'; regexp(hstr1, expr, 'match') ans = '</a><code>%%</code><br>'
The second string, hstr2
, does not contain the code tags at all. Just the same, the expression matches because ()?
allows for zero occurrences of the tags:
hstr2 = '<td><a name="18854"></a>%%<br></td>'; expr = '</a>(<code>)?..(</code>)?<br>'; regexp(hstr2, expr, 'match') ans = '</a>%%<br>'
Zero or More -- expr*
Use *
to match strings having any number of line breaks, including no line breaks at all.
hstr1 = '<p>This string has <br><br>line breaks</p>'; expr = '<p>.*(<br>)*.*</p>'; regexp(hstr1, expr, 'match') ans = '<p>This string has <br><br>line breaks</p>' hstr2 = '<p>This string has no line breaks</p>'; regexp(hstr2, expr, 'match') ans = '<p>This string has no line breaks</p>'
One or More -- expr+
Use +
to verify that the HTML image source is not empty. This looks for one or more characters in the gif
filename:
hstr = '<a href="s12.html"><img src="b_prev.gif" border=0>'; expr = '<img src="\w+.gif'; regexp(hstr, expr, 'match') ans = '<img src="b_prev.gif'
Exact, Minimum, and Maximum Quantities -- {min,max}
Use {m}
, {m,}
, and {m,n}
to verify the href
syntax used in HTML. This statement requires the href
to have at least one non-white-space character, followed by exactly one occurrence of .html
, optionally followed by #
and five to eight digits:
hstr = '<a name="18749"></a><a href="s13.html#18760">'; expr = '<a href="\w{1,}(\.html){1}(\#\d{5,8}){0,1}"'; regexp(hstr, expr, 'match') ans = '<a href="s13.html#18760"'
Greedy Quantifiers -- expr*
Use *
to match as many characters as possible between any <
and >
signs in the string. Because of the .*
in the expression, regexp
reads all characters in the string up to the end. Finding no closing >
at the end, regexp
then backs up to the </a>
and ends the phrase there:
hstr = '<tr valign=top><td><a name="19184"></a>xyz'; regexp(hstr, '<.*>', 'match') ans = '<tr valign=top><td><a name="19184"></a>'
Possessive Quantifiers -- expr*+
Except for the possessive *+
quantifier, this expression is the same as that used in the last example. Unlike the greedy quantifier, possessive quantifiers do not reevaluate parts of the string that have already been evaluated. This command scans the entire string because of the .*
quantifier, but then cannot back up to locate the </a>
sequence that would satisfy the expression. As a result, no match is found and regexp
returns an empty cell array:
This example shows the difference between lazy and greedy quantifiers. The first expression uses lazy .*?
to match the minimum number of characters between <tr
, <td
, or </td
tags:
hstr = '<tr valign=top><td><a name="19184"></a><br></td>'; regexp(hstr, '</?t.*?>', 'match') ans = '<tr valign=top>' '<td>' '</td>'
The second expression uses greedy .*
to match all characters from the opening <tr
to the ending </td
:
Lookaround Operators | Tokens |
© 1994-2005 The MathWorks, Inc.