Programming |
Working with Hierarchical Data Format (HDF5) Files
Hierarchical Data Format, Version 5, (HDF5) is a general-purpose, machine-independent standard for storing scientific data in files, developed by the National Center for Supercomputing Applications (NCSA). For more information about the HDF5 file format, read the HDF5 documentation available at the NCSA Web site (hdf.ncsa.uiuc.edu
).
Note For information about importing HDF4 data, which is a completely separate, incompatible format, see Importing HDF4 and HDF-EOS Data. |
This section describes how to import data or metadata from an HDF5 file. Topics covered include
Determining the Contents of an HDF5 File
HDF5 files can contain data and metadata. HDF5 files organize the data and metadata, called attributes, in a hierarchical structure, similar to the hierarchical structure of a file system.
In an HDF5 file, the directories in the hierarchy are called groups. A group can contain other groups, data sets, attributes, links, and data types. A data set is a collection of data, such as a multidimensional numeric array or string. An attribute is any data that is associated with another entity, such as a data set. A link is similar to a UNIX file system symbolic link. Links are a way to reference data without having to make a copy of the data.
Data types are a description of the data in the data set or attribute. Data types tell how to interpret the data in the data set. For example, a file might contain a data type called "Reading" that is comprised of three elements: a longitude value, a latitude value, and a temperature value.
To find the names of all the data sets and attributes contained in an HDF5 file, use the hdf5info
function. For example, to find out what the sample HDF5 file, example.h5
, contains, use this syntax:
hdf5info
returns a structure that contains various information about the HDF5 file, including the name of the file and the version of the HDF5 library that MATLAB is using:
fileinfo = Filename: 'example.h5' LibVersion: '1.4.2' Offset: 0 FileSize: 8172 GroupHierarchy: [1x1 struct]
Exploring the Contents of an HDF5 File
To explore the hierarchical organization of the file, examine the GroupHierarchy
field in the structure returned by hdf5info
. The GroupHierarchy
field is a structure that describes the top-level group in the file, called the root group. HDF5 uses the UNIX convention and names this top-level group /
(forward slash).
The following example shows that the GroupHierarchy
structure for the sample HDF5 file contains two groups and two attributes. The root group does not contain any data sets, data types, or links.
toplevel = fileinfo.GroupHierarchy toplevel = Filename: 'C:\matlab\toolbox\matlab\demos\example.h5' Name: '/' Groups: [1x2 struct] Datasets: [] Datatypes: [] Links: [] Attributes: [1x2 struct]
The following figure illustrates the organization of the root group.
Organization of the Root Group of the Sample HDF5 File
To explore the contents of the sample HDF5 file further, examine one of the two structures in the Groups
field of the GroupHierarchy
structure. Each structure in this field represents a group contained in the root group:
level2 = toplevel.Groups(2) level2 = Filename: 'C:\matlab\toolbox\matlab\demos\example.h5' Name: '/g2' Groups: [] Datasets: [1x2 struct] Datatypes: [] Links: [] Attributes: []
In the sample file, the group named /g2
contains two data sets. The following figure illustrates this part of the sample HDF5 file organization.
Organization of the Data Set /g2 in the Sample HDF5 File
To get information about a data set, look at either of the structures returned in the Datasets
field. These structures provide information about the data set, such as its name, dimensions, and data type.
dataset1 = level2.Datasets(1)
dataset1 =
Filename: 'L:\matlab\toolbox\matlab\demos\example.h5'
Name: '/g2/dset2.1'
Rank: 1
Datatype: [1x1 struct]
Dims: 10
MaxDims: 10
Layout: 'contiguous'
Attributes: []
Links: []
Chunksize: []
Fillvalue: []
By examining the structures at each level of the hierarchy, you can traverse the entire file. The following figure describes the hierarchical organization of the sample file example.h5
.
Hierarchical Structure of example.h5 HDF5 File
Importing Data from an HDF5 File
To read data or metadata from an HDF5 file, use the hdf5read
function. As arguments, you must specify the name of the HDF5 file and the name of the data set. For information about finding the name of a data set, see Determining the Contents of an HDF5 File.
For example, to read the data set, /g2/dset2.1
from the HDF5 file example.h5
, use this syntax:
The return value contains the values in the data set, in this case a 1-by-10 vector of single-precision values:
data = Columns 1 through 8 1.0000 1.1000 1.2000 1.3000 1.4000 1.5000 1.6000 1.7000 Columns 9 through 10 1.8000 1.9000
The hdf5read
function maps HDF5 data types to appropriate MATLAB data types, whenever possible. If the HDF5 file contains data types that cannot be represented in MATLAB, hdf5write
uses one of the predefined MATLAB HDF5 data type objects to represent the data.
For example, if an HDF5 data set contains four array elements, hdf5read
can return the data as a 1-by-4 array of hdf5.h5array
objects:
For more information about the MATLAB HDF5 data type objects, see Mapping HDF5 Data Types to MATLAB Data Types.
Exporting Data to HDF5 Files
To write data or metadata from the MATLAB workspace to an HDF5 file, use the hdf5write
function. As arguments, specify:
hdf5write
converts MATLAB data types to the appropriate HDF5 data type automatically. For nonatomic data types, you can also create HDF5 objects to represent the data.
This example creates a 5-by-5 array of uint8
values and then writes the array to an HDF5 file. By default, hdf5write
overwrites the file, if it already exists. The example specifies an hdf5write
mode option to append data to existing file.
uint8
values.
'append'
mode. The file must already exist and it cannot already contain a data set with the same name.
Mapping HDF5 Data Types to MATLAB Data Types
When the hdf5read
function reads data from an HDF5 file into the MATLAB workspace, it maps HDF5 data types to MATLAB data types, depending on whether the data in the dataset is in an atomic data type or a nonatomic composite data type.
Atomic data types describe commonly used binary formats for numbers (integers and floating point) and characters (ASCII). Because different computing architectures and programming languages support different number and character representations, the HDF5 library provides the platform-independent data types, which it then maps to an appropriate data type for each platform. For example, a computer may support 8-, 16-, 32-, and 64-bit signed integers, stored in memory in little endian byte order.
A composite data type is an aggregation of one or more atomic data types. Composite data types include structures, multidimensional arrays, and variable-length data types (one-dimensional arrays).
Mapping Atomic Data Types. If the data in the data set is stored in one of the HDF5 atomic data types, hdf5read
uses the equivalent MATLAB data type to represent the data. Each data set contains a Datatype
field that names the data type. For example, the data set /g2/dset2.2
in the sample HDF5 file includes atomic data and data type information.
The H5T_IEEE_F32BE
class name indicates the data is a 4-byte, big endian, IEEE floating-point data type. (See the HDF5 specification for more information about atomic data types.)
HDF5 Nonatomic Data Types. If the data in the data set is stored in one of the HDF5 nonatomic data types, hdf5read
represents the data set in MATLAB as an object. MATLAB supports the following objects to represent HDF5 nonatomic data types:
To access the data in the data set in the MATLAB workspace, you must access the Data
field in the object. This example converts a simple MATLAB vector into an h5array
object and then displays the fields in the object:
Using HDF5 Data Type Objects. If you are writing simple data sets, such as scalars, strings, or a simple compound data set, you can just pass the data directly to hdf5write
. The hdf5write
function can automatically map the MATLAB data types to appropriate HDF5 data types.
However, if your data is a complex data set, you must use one of the predefined MATLAB HDF5 objects to pass to the hdf5write
function. The HDF5 objects are designed for situations where the mapping between MATLAB and HDF5 types is ambiguous.
For example, when passed a cell array of strings, the hdf5write
function writes a data set made up of strings, not a data set of arrays containing strings. If that is not the mapping you intend, use HDF5 objects to specify the correct mapping.
In addition, note that HDF5 makes a distinction between the size of a data set and the size of a data type. In MATLAB, data types are always scalar. In HDF5, data types can have a size; that is, types can be either scalar (like MATLAB) or m-by-n.
In HDF5, a 5-by-5 data set containing a single uint8
value in each element is distinct from a 1-by-1 data set containing a 5-by-5 array of uint8
values. In the first case, the data set contains 25 observations of a single value; in the second case, the data set contains a single observation with 25 values.
This example uses an HDF5 enumeration object for enumerated data:
enum_obj
now contains the definition of the enumeration that associates the names RED
, GREEN
, and BLUE
with the numbers 1, 2, and 3.
GREEN
, RED
, BLUE
, BLUE
, GREEN
, etc.
objects
in an HDF5 file.
Working with Flexible Image Transport System (FITS) Files | Importing HDF4 and HDF-EOS Data |
© 1994-2005 The MathWorks, Inc.