HDF5 nodes, links, and groups

In HDF5 data is stored as a tree of nodes connected via links

_images/hdf5_basic_tree.svg

The root of the tree is the root group which is itself a group node. Currently two kinds of nodes are supported

which both are subclasses of Node. The nodes are connected via links (Link). There is a common misconception about HDF5 that nodes (datasets and groups) have names. It is not the node which has a name but rather the link which connects a particular node with its parent node. It is possible that several links with different name point to the same node within a file. Thus the concept of a name for a node does not make too much sense. What makes sense instead is the concept of a path used to access a particular node (see below).

Every node can have attributes attached to it (pninexus.h5cpp.attribute.Attribute) which can be used to store meta-data about a particular node. The attributes of a node can be accessed via the Node.attributes property. We will discuss attributes later (see HDF5 attributes for more details). Aside from attributes the most important property of Node is the link property. It provides information about the link via which the node was accessed.

In this section we will discuss groups and links.

Groups

Groups are containers for links which refer to the child nodes of a group. The Group class provides an interface to work with groups and which will be discussed in more detail in this section.

Creating groups

Groups are created by calling their constructor:

from pninexus import h5cpp
from pninexus.h5cpp.node import Group
from pninexus.h5cpp import Path

h5file = h5cpp.file.create(...)
root   = h5file.root()

run = Group(root,"run_0001")

There are two notable things taking place in this example.

  1. the entry point to the HDF5 node hierarchy is the root group which can be obtained from the root() method of the file class. It returns an instance of Group as any other group would be. Unlike for many HDF5 wrappers and the C-API itself the file object does not have any group semantics.

  2. the constructor of Group takes two parameters

    • the first one is a reference to the parent group of the new group

    • and the second is path to the new group relative to the base.

If the path to the new group comprises more than one element all the intermediate groups must exist. If you want to create also these groups collectively you have to provide a special link creation property list

from pninexus.h5cpp.property import LinkCreationList

lcpl = LinkCreationList()
lcpl.intermediate_group_creation = True
temperature = Group(root,"run_0001/sensors/temperature",lcpl=lcpl)

There are three distinct things you can do with an instance of Group

  1. you can use it as a parent group for new nodes (datasets and groups)

  2. you can iterate over its child nodes

  3. you can iterate over the links attached to it.

Accessing child nodes

Access to the child nodes of a particular group is provided via its Group.nodes property which returns an instance of NodeView. To access a particular child one could use

group = ...

if group.nodes.exists("data"):
   data = group.nodes["data"]

The NodeView instance only provides access to the immediate children of a group. NodeView.__getitem__() returns either an instance of Group or Dataset.

In order to iterate over the child nodes of a group one has two choices. For iteration over the immediate child nodes only, the NodeView instance provides a Python iterator interface

group = ...

for node in group.nodes;
   print(node.link.path)

The path property of the Link class returns the HDF5 path of the link (and hence, in this case, to the node it dereferences). Alternatively, if you want to iterate recursively over all the nodes below a group including that of its subgroups, the NodeView class provides a recursive node iterator via its recursive property

group = ....

for node in group.nodes.recursive;
   print(node.link.path)

Attention

Though this iterator approach is quite simple it has a significant downside: it assumes that all the links accessed via the iteration can be resolved. If a link does not exist and cannot be resolved to an object a RuntimeError exception will be thrown. So in the case of a file whose structure is mainly unknown rather follow the approach in the next subsection and iterate over links rather than over nodes.

Utility functions

There are several utility functions available for working with groups and links. Two of them shall be presented here.

Retrieving nodes by path

The NodeView interface allows only access by name to immediate child nodes of a group. The idea was that a group should have a similar semantic as a Python dictionary. However, if you want to access nodes via a path you may consider using the get_node() function

from pninexus.h5cpp.node import get_node
from pninexus.h5cpp import Path

temperature = get_node(root,Path("run_0001/sensors/temperature"))