Quantifiers :: Basic Program Components (Programming)

Programming

Quantifiers

You can use quantifiers to specify how many instances of an element are to be matched. The first six rows of this table show the basic quantifiers. When used alone, they match as much of the string as possible. Thus these are sometimes called greedy quantifiers.

When one of these quantifiers is followed by a plus sign (e.g., '\w*+'), it is known as a possessive quantifier. Possessive quantifiers also match as much of the string as possible, but they do not rescan any portions of the string should the initial match fail.

When you follow a quantifier with a question mark (e.g., '\w*?'), it is known as a lazy quantifier. Lazy quantifiers match as little of the string as possible.

See the examples for each quantifier and quantifier type following the table.

Operator
Usage

expr?
Match the preceding element 0 times or 1 time. Equivalent to {0,1}.

expr*
Match the preceding element 0 or more times. Equivalent to {0,}.

expr+
Match the preceding element 1 or more times. Equivalent to {1,}.

expr{n}
Must match exactly n times. Equivalent to {n,n}.

expr{n,}
Must occur at least n times.

expr{n,m}
Must occur at least n times but no more than m times.

qu_expr?
Match the quantifed expression according to the guidelines stated above for lazy quantifiers, where qu_expr represents any one of the expressions shown in the top six rows of this table.

qu_expr+
Match the quantified expression according to the guidelines stated above for possessive quantifiers, where qu_expr represents any one of the expressions shown in the top six rows of this table.

Operator	Usage
`expr?`	Match the preceding element 0 times or 1 time. Equivalent to `{0,1}`.
`expr*`	Match the preceding element 0 or more times. Equivalent to `{0,}`.
`expr+`	Match the preceding element 1 or more times. Equivalent to `{1,}`.
`expr{n}`	Must match exactly `n` times. Equivalent to `{n,n}`.
`expr{n,}`	Must occur at least `n` times.
`expr{n,m}`	Must occur at least `n` times but no more than `m` times.
`qu_expr?`	Match the quantifed expression according to the guidelines stated above for lazy quantifiers, where `qu_expr` represents any one of the expressions shown in the top six rows of this table.
`qu_expr+`	Match the quantified expression according to the guidelines stated above for possessive quantifiers, where `qu_expr` represents any one of the expressions shown in the top six rows of this table.

Zero or One -- expr?

Use ? to make the HTML <code> and </code> tags optional in the string. The first string, hstr1, contains one occurrence of each tag. Since the expression uses ()? around the tags, one occurrence is a match:

hstr1 = '<td><a name="18854"></a><code>%%</code><br></td>';
expr = '</a>(<code>)?..(</code>)?<br>';

regexp(hstr1, expr, 'match')
ans =
    '</a><code>%%</code><br>'

The second string, hstr2, does not contain the code tags at all. Just the same, the expression matches because ()? allows for zero occurrences of the tags:

hstr2 = '<td><a name="18854"></a>%%<br></td>';
expr = '</a>(<code>)?..(</code>)?<br>';

regexp(hstr2, expr, 'match')
ans =
    '</a>%%<br>'

Zero or More -- expr*

Use * to match strings having any number of line breaks, including no line breaks at all.

hstr1 = '<p>This string has <br><br>line breaks</p>';
expr = '<p>.*(<br>)*.*</p>';

regexp(hstr1, expr, 'match')
ans =
    '<p>This string has <br><br>line breaks</p>'

hstr2 = '<p>This string has no line breaks</p>';
regexp(hstr2, expr, 'match')
ans =
    '<p>This string has no line breaks</p>'

One or More -- expr+

Use + to verify that the HTML image source is not empty. This looks for one or more characters in the gif filename:

hstr = '<a href="s12.html"><img src="b_prev.gif" border=0>';
expr = '<img src="\w+.gif';

regexp(hstr, expr, 'match')
ans =
    '<img src="b_prev.gif'

Exact, Minimum, and Maximum Quantities -- {min,max}

Use {m}, {m,}, and {m,n} to verify the href syntax used in HTML. This statement requires the href to have at least one non-white-space character, followed by exactly one occurrence of .html, optionally followed by # and five to eight digits:

hstr = '<a name="18749"></a><a href="s13.html#18760">';
expr = '<a href="\w{1,}(\.html){1}(\#\d{5,8}){0,1}"';

regexp(hstr, expr, 'match')
ans =
    '<a href="s13.html#18760"'

Greedy Quantifiers -- expr*

Use * to match as many characters as possible between any < and > signs in the string. Because of the .* in the expression, regexp reads all characters in the string up to the end. Finding no closing > at the end, regexp then backs up to the </a> and ends the phrase there:

hstr = '<tr valign=top><td><a name="19184"></a>xyz';

regexp(hstr, '<.*>', 'match')
ans =
    '<tr valign=top><td><a name="19184"></a>'

Possessive Quantifiers -- expr*+

Except for the possessive *+ quantifier, this expression is the same as that used in the last example. Unlike the greedy quantifier, possessive quantifiers do not reevaluate parts of the string that have already been evaluated. This command scans the entire string because of the .* quantifier, but then cannot back up to locate the </a> sequence that would satisfy the expression. As a result, no match is found and regexp returns an empty cell array:

regexp(hstr, '<.*+>', 'match')
ans =
     {}

Lazy Quantifiers -- expr*?

This example shows the difference between lazy and greedy quantifiers. The first expression uses lazy .*? to match the minimum number of characters between <tr, <td, or </td tags:

hstr = '<tr valign=top><td><a name="19184"></a><br></td>';

regexp(hstr, '</?t.*?>', 'match')
ans =
    '<tr valign=top>'    '<td>'    '</td>'

The second expression uses greedy .* to match all characters from the opening <tr to the ending </td:

regexp(hstr, '</?t.*>', 'match')
ans =
    '<tr valign=top><td><a name="19184"></a><br></td>'

Lookaround Operators Tokens