Programming Previous page   Next Page

Working with Hierarchical Data Format (HDF5) Files

Hierarchical Data Format, Version 5, (HDF5) is a general-purpose, machine-independent standard for storing scientific data in files, developed by the National Center for Supercomputing Applications (NCSA). For more information about the HDF5 file format, read the HDF5 documentation available at the NCSA Web site (hdf.ncsa.uiuc.edu).

This section describes how to import data or metadata from an HDF5 file. Topics covered include

Determining the Contents of an HDF5 File

HDF5 files can contain data and metadata. HDF5 files organize the data and metadata, called attributes, in a hierarchical structure, similar to the hierarchical structure of a file system.

In an HDF5 file, the directories in the hierarchy are called groups. A group can contain other groups, data sets, attributes, links, and data types. A data set is a collection of data, such as a multidimensional numeric array or string. An attribute is any data that is associated with another entity, such as a data set. A link is similar to a UNIX file system symbolic link. Links are a way to reference data without having to make a copy of the data.

Data types are a description of the data in the data set or attribute. Data types tell how to interpret the data in the data set. For example, a file might contain a data type called "Reading" that is comprised of three elements: a longitude value, a latitude value, and a temperature value.

To find the names of all the data sets and attributes contained in an HDF5 file, use the hdf5info function. For example, to find out what the sample HDF5 file, example.h5, contains, use this syntax:

hdf5info returns a structure that contains various information about the HDF5 file, including the name of the file and the version of the HDF5 library that MATLAB is using:

Exploring the Contents of an HDF5 File

To explore the hierarchical organization of the file, examine the GroupHierarchy field in the structure returned by hdf5info. The GroupHierarchy field is a structure that describes the top-level group in the file, called the root group. HDF5 uses the UNIX convention and names this top-level group / (forward slash).

The following example shows that the GroupHierarchy structure for the sample HDF5 file contains two groups and two attributes. The root group does not contain any data sets, data types, or links.

The following figure illustrates the organization of the root group.

Organization of the Root Group of the Sample HDF5 File

To explore the contents of the sample HDF5 file further, examine one of the two structures in the Groups field of the GroupHierarchy structure. Each structure in this field represents a group contained in the root group:

In the sample file, the group named /g2 contains two data sets. The following figure illustrates this part of the sample HDF5 file organization.

Organization of the Data Set /g2 in the Sample HDF5 File

To get information about a data set, look at either of the structures returned in the Datasets field. These structures provide information about the data set, such as its name, dimensions, and data type.

By examining the structures at each level of the hierarchy, you can traverse the entire file. The following figure describes the hierarchical organization of the sample file example.h5.

Hierarchical Structure of example.h5 HDF5 File

Importing Data from an HDF5 File

To read data or metadata from an HDF5 file, use the hdf5read function. As arguments, you must specify the name of the HDF5 file and the name of the data set. For information about finding the name of a data set, see Determining the Contents of an HDF5 File.

For example, to read the data set, /g2/dset2.1 from the HDF5 file example.h5, use this syntax:

The return value contains the values in the data set, in this case a 1-by-10 vector of single-precision values:

The hdf5read function maps HDF5 data types to appropriate MATLAB data types, whenever possible. If the HDF5 file contains data types that cannot be represented in MATLAB, hdf5write uses one of the predefined MATLAB HDF5 data type objects to represent the data.

For example, if an HDF5 data set contains four array elements, hdf5read can return the data as a 1-by-4 array of hdf5.h5array objects:

For more information about the MATLAB HDF5 data type objects, see Mapping HDF5 Data Types to MATLAB Data Types.

Exporting Data to HDF5 Files

To write data or metadata from the MATLAB workspace to an HDF5 file, use the hdf5write function. As arguments, specify:

This example creates a 5-by-5 array of uint8 values and then writes the array to an HDF5 file. By default, hdf5write overwrites the file, if it already exists. The example specifies an hdf5write mode option to append data to existing file.

  1. Create a MATLAB variable in the workspace. This example creates a 5-by-5 array of uint8 values.
  2. Add the data to an existing HDF5 file. To add data to an existing file, you must specify 'append' mode. The file must already exist and it cannot already contain a data set with the same name.

Mapping HDF5 Data Types to MATLAB Data Types

When the hdf5read function reads data from an HDF5 file into the MATLAB workspace, it maps HDF5 data types to MATLAB data types, depending on whether the data in the dataset is in an atomic data type or a nonatomic composite data type.

Atomic data types describe commonly used binary formats for numbers (integers and floating point) and characters (ASCII). Because different computing architectures and programming languages support different number and character representations, the HDF5 library provides the platform-independent data types, which it then maps to an appropriate data type for each platform. For example, a computer may support 8-, 16-, 32-, and 64-bit signed integers, stored in memory in little endian byte order.

A composite data type is an aggregation of one or more atomic data types. Composite data types include structures, multidimensional arrays, and variable-length data types (one-dimensional arrays).

Mapping Atomic Data Types.   If the data in the data set is stored in one of the HDF5 atomic data types, hdf5read uses the equivalent MATLAB data type to represent the data. Each data set contains a Datatype field that names the data type. For example, the data set /g2/dset2.2 in the sample HDF5 file includes atomic data and data type information.

The H5T_IEEE_F32BE class name indicates the data is a 4-byte, big endian, IEEE floating-point data type. (See the HDF5 specification for more information about atomic data types.)

HDF5 Nonatomic Data Types.   If the data in the data set is stored in one of the HDF5 nonatomic data types, hdf5read represents the data set in MATLAB as an object. MATLAB supports the following objects to represent HDF5 nonatomic data types:

To access the data in the data set in the MATLAB workspace, you must access the Data field in the object. This example converts a simple MATLAB vector into an h5array object and then displays the fields in the object:

Using HDF5 Data Type Objects.   If you are writing simple data sets, such as scalars, strings, or a simple compound data set, you can just pass the data directly to hdf5write. The hdf5write function can automatically map the MATLAB data types to appropriate HDF5 data types.

However, if your data is a complex data set, you must use one of the predefined MATLAB HDF5 objects to pass to the hdf5write function. The HDF5 objects are designed for situations where the mapping between MATLAB and HDF5 types is ambiguous.

For example, when passed a cell array of strings, the hdf5write function writes a data set made up of strings, not a data set of arrays containing strings. If that is not the mapping you intend, use HDF5 objects to specify the correct mapping.

In addition, note that HDF5 makes a distinction between the size of a data set and the size of a data type. In MATLAB, data types are always scalar. In HDF5, data types can have a size; that is, types can be either scalar (like MATLAB) or m-by-n.

In HDF5, a 5-by-5 data set containing a single uint8 value in each element is distinct from a 1-by-1 data set containing a 5-by-5 array of uint8 values. In the first case, the data set contains 25 observations of a single value; in the second case, the data set contains a single observation with 25 values.

This example uses an HDF5 enumeration object for enumerated data:

  1. Create an HDF5 enumerated object.
  2. Define the enumerated values and their corresponding names.
  1. enum_obj now contains the definition of the enumeration that associates the names RED, GREEN, and BLUE with the numbers 1, 2, and 3.

  1. Add enumerated data to the object.
  1. In the HDF5 file, these numeric values map to the enumerated values GREEN, RED, BLUE, BLUE, GREEN, etc.

  1. Write the enumerated data to a data set named objects in an HDF5 file.
  2. Read the enumerated data set from the file.

Previous page  Working with Flexible Image Transport System (FITS) Files Importing HDF4 and HDF-EOS Data Next page

© 1994-2005 The MathWorks, Inc.