Programming Previous page   Next Page

Handling Multiple Strings

You can use any of the MATLAB regular expression functions with cell arrays of strings as well as with single strings. Any or all of the input parameters (the string, expression, or replacement string) can be a cell array of strings. The regexp function requires that the string and expression arrays have the same number of elements if both are vectorized (i.e., if they have dimensions greater than 1-by-N). The regexprep function requires that the expression and replacement arrays have the same number of elements if the replacement array is vectorized. (The cell arrays do not have to have the same shape.)

Whenever the first input parameter to a regular expression function is a cell array, all output values are cell arrays of the same size.

This section covers the following topics:

Finding a Single Pattern in Multiple Strings

The example shown here uses the regexp function on a cell array of strings cstr. It searches each string of the cell array for consecutive matching letters (e.g., 'oo'). The function returns a cell array of the same size as the input array. Each row of the return array contains the indices for which there was a match against the input cell array.

Here is the input cell array:

Find consecutive matching letters by capturing a letter as a token (.) and then repeating that letter as a token reference, \1:

To return substrings instead of indices, use the 'match' parameter:

Finding Multiple Patterns in Multiple Strings

This example uses a cell array of strings in both the input string and the expression. The two cell arrays are of different shapes: cstr is 4-by-1 while expr is 1-by-4. The command is valid as long as they both have the same number of cells.

Find uppercase or lowercase 'i' followed by a white-space character in str{1}, the sequence 'hou' in str{2}, two consecutive matching letters in str{3}, and words beginning with 'w' followed by a vowel in str{4}.

Note that the returned cell array has the dimensions of the input string, cstr. The dimensions of the return value are always derived from the input string, whenever the input string is a cell array. If the input string is not a cell array, then it is the dimensions of the expression that determine the shape of the return array.

Replacing Multiple Strings

When replacing multiple strings with regexprep, use a single replacement string if the expression consists of a single string. This example uses a common replacement value ('--') for all matches found in the multiple string input cstr. The function returns a cell array of strings having the same dimensions as the input cell array:

You can use multiple replacement strings if the expression consists of multiple strings. In this example, the input string and replacement string are both 4-by-1 cell arrays, and the expression is a 1-by-4 cell array. As long as the expression and replacement arrays contain the same number of elements, the statement is valid. The dimensions of the return value match the dimensions of the input string:


Previous page  Tokens Operator Summary Next page

© 1994-2005 The MathWorks, Inc.