Memory Allocation for Arrays :: Improving Performance and Memory Usage (Programming)

Programming

Memory Allocation for Arrays

The topics below provide information on how MATLAB allocates memory when working with arrays and variables. The purpose is to help you use memory more efficiently when writing code. Most of the time, however, you should not need to be concerned with these internal operations as MATLAB handles data storage for you automatically.

Note Any information on how data is handled internally by MATLAB is subject to change in future releases.

Creating and Modifying Arrays

When you assign any type of data (a numeric, string, or structure array, for example) to a variable, MATLAB allocates a contiguous block of memory and stores the array data in that block. It also stores information about the array data, such as its data type and dimensions, in a separate, small block of memory called a header. The variable that you assign this data to is actually a pointer to the data; it does not contain the data.

If you add new elements to an existing array, MATLAB expands the existing array in memory in a way that keeps its storage contiguous. This might require finding a new block of memory large enough to hold the expanded array, and then copying the contents of the array from its original location to the new block in memory, adding the new elements to the array in this block, and freeing up the original array location in memory.

If you remove elements from an existing array, MATLAB keeps the memory storage contiguous by removing the deleted elements, and then compacting its storage in the original memory location.

Working with Large Data Sets. If you are working with large data sets, you need to be careful when increasing the size of an array to avoid getting errors caused by insufficient memory. If you expand the array beyond the available contiguous memory of its original location, MATLAB has to make a copy of the array in a new location in memory, as explained above, and then set this array to its new value. During this operation, there are two copies of the original array in memory, thus temporarily doubling the amount of memory required for the array and increasing the risk of your program running out of memory during execution. It is better to preallocate sufficient memory for the array at the start. See Preallocating Arrays.

Copying Arrays

Internally, multiple variables can point to the same block of data, thus sharing that array's value. When you copy a variable to another variable (e.g., B = A), MATLAB makes a copy of the pointer, not the array. For example, the following code creates a single 500-by-500 matrix and two pointers to it, A and B:

```
A = magic(500);
B = A;
```

As long as the contents of the array are not modified, there is no need to store two copies of it. If you modify the array, then MATLAB does create a separate array to hold the new values.

If you modify the array shown above by referencing it with variable A (e.g., A(400,:) = 0), then MATLAB creates a copy of the array, modifies it accordingly, and stores a pointer to the new array in A. Variable B continues to point to the original array. If you modify the array by referencing it with variable B (e.g., B(400,:) = 0), the same thing happens except that it is B that points to the new array.

Array Headers

When you assign an array to a variable, MATLAB also stores information about the array (such as data type and dimensions) in a separate piece of memory called a header. For most arrays, the memory required to store the header is insignificant. There is a small advantage though to storing large data sets in a small number of large arrays as opposed to a large number of small arrays, as the former configuration requires fewer array headers.

Structure and Cell Arrays. For structures and cell arrays, MATLAB creates a header not only for each array, but also for each field of the structure and for each cell of a cell array. Because of this, the amount of memory required to store a structure or cell array depends not only on how much data it holds, but also how it is constructed.

For example, a scalar structure array S1 having fields R, G, and B, each field of size 100-by-50, requires one array header to describe the overall structure, and one header to describe each of the three field arrays, making a total of 4 array headers for the entire data structure:

S1.R(1:100,1:50)
S1.G(1:100,1:50)
S1.B(1:100,1:50)

On the other hand, a 100-by-50 structure array S2 in which each element has scalar fields R, G, and B requires one array header to describe the overall structure, and one array header per field for each of the 5,000 elements of the structure, making a total of 15,001 array headers for the entire data structure:

S2(1:100,1:50).R
S2(1:100,1:50).G
S2(1:100,1:50).B

Thus, even though S1 and S2 contain the same amount of data, S1 uses significantly less space in memory. Not only is less memory required, but there is a corresponding speed benefit to using the S1 format as well.

Memory Usage Reported By the whos Function. The whos function displays the amount of memory consumed by any variable. For reasons of simplicity, whos reports only the memory used to store the actual data. It does not report storage for the variable itself or the array header.

Function Arguments

MATLAB handles arguments passed in function calls in a similar way. When you pass a variable to a function, you are actually passing a pointer to the data that the variable represents. As long as the input data is not modified by the function being called, the variable in the calling function and the variable in the called function point to the same location in memory. If the called function modifies the value of the input data, then MATLAB makes a copy of the original array in a new location in memory, updates that copy with the modified value, and points the input variable in the called function to this new array.

In the example below, function myfun modifies the value of the array passed into it. MATLAB makes a copy in memory of the array pointed to by A, sets variable X as a pointer to this new array, and then sets one row of X to zero. The array referenced by A remains unchanged:

A = magic(500);
myfun(A);

function myfun(X)
X(400,:) = 0;

If the calling function needs the modified value of the array it passed to myfun, you will need to return the updated array as an output of the called function, as shown here for variable A:

A = magic(500);
A = myfun(A);
sprintf('The new value of A is %d', A)

function Y = myfun(X)
X(400,:) = 0;
Y = X;

Working with Large Data Sets. Again, when working with large data sets, you should be aware that MATLAB makes a temporary copy of A if the called function modifies its value. This temporarily doubles the memory required to store the array, which causes MATLAB to generate an error if sufficient memory is not available.

One way to avoid running out of memory in this situation is to use nested functions. A nested function shares the workspace of all outer functions, giving the nested function access to data outside of its usual scope. In the example shown here, nested function setrowval has direct access to the workspace of the outer function myfun, making it unnecessary to pass a copy of the variable in the function call. When setrowval modifies the value of A, it modifies it in the workspace of the calling function. There is no need to use additional memory to hold a separate array for the function being called, and there also is no need to return the modified value of A:

function myfun
A = magic(500);

   function setrowval(row, value)
   A(row,:) = value;
   end

setrowval(400, 0);
disp('The new value of A(399:401,1:10) is')
A(399:401,1:10)
end

Using Memory Efficiently Data Structures and Memory