Programming |
You can use any of the MATLAB regular expression functions with cell arrays of strings as well as with single strings. Any or all of the input parameters (the string, expression, or replacement string) can be a cell array of strings. The regexp
function requires that the string and expression arrays have the same number of elements if both are vectorized (i.e., if they have dimensions greater than 1-by-N). The regexprep
function requires that the expression and replacement arrays have the same number of elements if the replacement array is vectorized. (The cell arrays do not have to have the same shape.)
Whenever the first input parameter to a regular expression function is a cell array, all output values are cell arrays of the same size.
This section covers the following topics:
Finding a Single Pattern in Multiple Strings
The example shown here uses the regexp
function on a cell array of strings cstr
. It searches each string of the cell array for consecutive matching letters (e.g., 'oo'
). The function returns a cell array of the same size as the input array. Each row of the return array contains the indices for which there was a match against the input cell array.
cstr = { ... 'Whose woods these are I think I know.' ; ... 'His house is in the village though;' ; ... 'He will not see me stopping here' ; ... 'To watch his woods fill up with snow.'};
Find consecutive matching letters by capturing a letter as a token (.)
and then repeating that letter as a token reference, \1
:
idx = regexp(cstr, '(.)\1'); whos idx Name Size Bytes Class idx 4x1 296 cell array idx{:} ans = % 'Whose woods these are I think I know.' 8 % |8 ans = % 'His house is in the village though;' 23 % |23 ans = % 'He will not see me stopping here' 6 14 23 % |6 |14 |23 ans = % 'To watch his woods fill up with snow.' 15 22 % |15 |22
To return substrings instead of indices, use the 'match'
parameter:
Finding Multiple Patterns in Multiple Strings
This example uses a cell array of strings in both the input string and the expression. The two cell arrays are of different shapes: cstr
is 4-by-1 while expr
is 1-by-4. The command is valid as long as they both have the same number of cells.
Find uppercase or lowercase 'i'
followed by a white-space character in str{1}
, the sequence 'hou'
in str{2}
, two consecutive matching letters in str{3}
, and words beginning with 'w'
followed by a vowel in str{4}
.
expr = {'i\s', 'hou', '(.)\1', '\<w[aeiou]'}; idx = regexpi(cstr, expr); idx{:} ans = % 'Whose woods these are I think I know.' 23 31 % |23 |31 ans = % 'His house is in the village though;' 5 30 % |5 |30 ans = % 'He will not see me stopping here' 6 14 23 % |6 |14 |23 ans = % 'To watch his woods fill up with snow.' 4 14 28 % |4 |14 |28
Note that the returned cell array has the dimensions of the input string, cstr
. The dimensions of the return value are always derived from the input string, whenever the input string is a cell array. If the input string is not a cell array, then it is the dimensions of the expression that determine the shape of the return array.
Replacing Multiple Strings
When replacing multiple strings with regexprep
, use a single replacement string if the expression consists of a single string. This example uses a common replacement value ('--'
) for all matches found in the multiple string input cstr
. The function returns a cell array of strings having the same dimensions as the input cell array:
s = regexprep(cstr, '(.)\1', '--', 'ignorecase') s = 'Whose w--ds these are I think I know.' 'His house is in the vi--age though;' 'He wi-- not s-- me sto--ing here' 'To watch his w--ds fi-- up with snow.'
You can use multiple replacement strings if the expression consists of multiple strings. In this example, the input string and replacement string are both 4-by-1 cell arrays, and the expression is a 1-by-4 cell array. As long as the expression and replacement arrays contain the same number of elements, the statement is valid. The dimensions of the return value match the dimensions of the input string:
expr = {'i\s', 'hou', '(.)\1', '\<w[aeiou]'}; repl = {'-1-'; '-2-'; '-3-'; '-4-'}; s = regexprep(cstr, expr, repl, 'ignorecase') s = 'Whose w-3-ds these are -1-think -1-know.' 'His -2-se is in the vi-3-age t-2-gh;' 'He -4--3- not s-3- me sto-3-ing here' 'To -4-tch his w-3-ds fi-3- up -4-th snow.'
Tokens | Operator Summary |
© 1994-2005 The MathWorks, Inc.