Programming

Handling Multiple Strings

You can use any of the MATLAB regular expression functions with cell arrays of strings as well as with single strings. Any or all of the input parameters (the string, expression, or replacement string) can be a cell array of strings. The `regexp` function requires that the string and expression arrays have the same number of elements if both are vectorized (i.e., if they have dimensions greater than 1-by-N). The `regexprep` function requires that the expression and replacement arrays have the same number of elements if the replacement array is vectorized. (The cell arrays do not have to have the same shape.)

Whenever the first input parameter to a regular expression function is a cell array, all output values are cell arrays of the same size.

This section covers the following topics:

Finding a Single Pattern in Multiple Strings

The example shown here uses the `regexp` function on a cell array of strings `cstr`. It searches each string of the cell array for consecutive matching letters (e.g., `'oo'`). The function returns a cell array of the same size as the input array. Each row of the return array contains the indices for which there was a match against the input cell array.

Here is the input cell array:

• ```cstr = {                                  ...
'Whose woods these are I think I know.' ; ...
'His house is in the village though;'   ; ...
'He will not see me stopping here'      ; ...
'To watch his woods fill up with snow.'};
```

Find consecutive matching letters by capturing a letter as a token `(.)` and then repeating that letter as a token reference, `\1`:

• ```idx = regexp(cstr, '(.)\1');

whos idx
Name      Size                   Bytes  Class

idx       4x1                      296  cell array

idx{:}
ans =                 % 'Whose woods these are I think I know.'
8                 %         |8

ans =                 % 'His house is in the village though;'
23                 %                        |23

ans =                 % 'He will not see me stopping here'
6    14    23     %       |6      |14      |23

ans =                 % 'To watch his woods fill up with snow.'
15    22           %                |15    |22
```

To return substrings instead of indices, use the `'match'` parameter:

• ```mat = regexp(cstr, '(.)\1', 'match');
mat{3}
ans =
'll'    'ee'    'pp'
```

Finding Multiple Patterns in Multiple Strings

This example uses a cell array of strings in both the input string and the expression. The two cell arrays are of different shapes: `cstr` is 4-by-1 while `expr` is 1-by-4. The command is valid as long as they both have the same number of cells.

Find uppercase or lowercase `'i'` followed by a white-space character in `str{1}`, the sequence `'hou'` in `str{2}`, two consecutive matching letters in `str{3}`, and words beginning with `'w'` followed by a vowel in `str{4}`.

• ```expr = {'i\s', 'hou', '(.)\1', '\<w[aeiou]'};
idx = regexpi(cstr, expr);

idx{:}
ans =                 % 'Whose woods these are I think I know.'
23    31           %                        |23     |31

ans =                 % 'His house is in the village though;'
5    30           %      |5                       |30

ans =                 % 'He will not see me stopping here'
6    14    23     %       |6      |14      |23

ans =                 % 'To watch his woods fill up with snow.'
4    14    28     %     |4        |14           |28
```

Note that the returned cell array has the dimensions of the input string, `cstr`. The dimensions of the return value are always derived from the input string, whenever the input string is a cell array. If the input string is not a cell array, then it is the dimensions of the expression that determine the shape of the return array.

Replacing Multiple Strings

When replacing multiple strings with `regexprep`, use a single replacement string if the expression consists of a single string. This example uses a common replacement value (`'--'`) for all matches found in the multiple string input `cstr`. The function returns a cell array of strings having the same dimensions as the input cell array:

• ```s = regexprep(cstr, '(.)\1', '--', 'ignorecase')
s =
'Whose w--ds these are I think I know.'
'His house is in the vi--age though;'
'He wi-- not s-- me sto--ing here'
'To watch his w--ds fi-- up with snow.'
```

You can use multiple replacement strings if the expression consists of multiple strings. In this example, the input string and replacement string are both 4-by-1 cell arrays, and the expression is a 1-by-4 cell array. As long as the expression and replacement arrays contain the same number of elements, the statement is valid. The dimensions of the return value match the dimensions of the input string:

• ```expr = {'i\s', 'hou', '(.)\1', '\<w[aeiou]'};
repl = {'-1-'; '-2-'; '-3-'; '-4-'};

s = regexprep(cstr, expr, repl, 'ignorecase')
s =
'Whose w-3-ds these are -1-think -1-know.'
'His -2-se is in the vi-3-age t-2-gh;'
'He -4--3- not s-3- me sto-3-ing here'
'To -4-tch his w-3-ds fi-3- up -4-th snow.'
```

 Tokens Operator Summary