NeXus path’s: pni::nexus::Path

Until now we discussed only the basic properties of a NeXus path. As these paths are rather powerful tools this chapter is entirely dedicated to such paths.

The structure of a Nexus path

In comparison to a plain HDF5 path the NeXus path as provided by libpniio has to additional features

  • it includes the name of the file and thus could be used by command line programs to entirely determine the location of an object

  • it is able to address attributes (hdf5::Path can only address node objects within a file).

To describe the anatomy of a NeXus path in libpniio we consider the following example

/home/user/data/experiment.nxs://run_001:NXentry/:NXinstrument/:NXdetector/data@units

where we can identify all three sections comprising a libpniio NeXus path

  • file section - /home/user/data/experiment.nxs:/

  • node section - /run_001:NXentry/:NXinstrument/:NXdetector/data

  • attribute section - units

In more detail these three sections describe

Section

Description

file section

which references the NeXus-file on the file system. It must thus be a valid file system path on the operating system platform in use.

node section

describing the location of an node within the file

attribute section

referencing an attribute attached to the object pointed to by the residual path. The attribute is identified by its name.

The file- and the node-section are separated by :/ and the node- and the attribute-section are separated by a @ symbol. Each element in the node-section of the path consists of two path

  • the nodes link name

  • and an optional base class type

separated by a semicolon. The base class type part only makes sense if a group should be referenced as individual fields (or datasets in HDF5 terminology) are not associated with a base class type. There are three permitted forms how a node element could look like

node element

description

name:class

this would be the full description of a base class including the name of the link to the group as well as the base class type it belongs to.

name

only the name of the link. Something like this could be used for both: groups and fields.

:class

references a group by the base class type it belongs to.

A path is considered as absolute if its node section starts at the root group of the file. This is always the case if

  • the file section of the path is not empty (if we give a file the node section has to start at the root node and thus must be absolute)

  • or, if no file section is given, the object section starts with a leading /.

The latter condition is equivalent to the convention used for Unix file system paths while the former requires some explanation.

Equality and matching of paths

Equality

The equality of two NeXus paths is rather trivial: two paths can be considered equal if all of their elements are equal.

Matching

Two paths are considered as matching if they are not equal but capable of referencing the same object within a single file. To illustrate this situation consider the following three paths

  1. a = /entry/instrument/detector/data

  2. b = /entry:NXentry/instrument:NXinstrument/detector:NXdetector/data

  3. c = /:NXentry/:NXinstrument/:NXdetector/data

each being perfectly well defined NeXus paths referencing the data field in a detector group. It is obvious for path a and b that they reference the same object. The same is true for the paths b and c. Surprisingly, a and c do not match. As a does not provide any type information for each of its nodes we cannot be sure that it references the same object as c. Being more specific: none of the groups in a has to be a NeXus group at all. We get the rough idea that the property of two path of matching each other has something to do with the number of elements they have common.

In order to derive a reasonable set of rules determining whether or not two paths are matching we start with deriving rules to deciding under which conditions the node-elements of the node section of a path are matching.

The first rule covers the trivial case of equality

Note

Two node elements a and b are considered as matching if they are equal in the above case: a=b.

For instance, let a=entry:NXentry and b=entry:NXentry it is obvious that they are referencing the same node as they are equal in the above sense.

Furthermore, we can propose a second rule

Note

Two node elements a and b can be considered matching if their class component is equal and only one of them has the name attribute set.

This would be the case if a=:NXentry and b=entry:NXentry. This is somehow logical if we consider that a is just a more general version of b. However, it is crucial that only one of them has a non empty name attribute. Otherwise this rule would violate rule one.

Now as we have derived two rules for matching node elements we can generalize a single rule for paths

Note

Two paths a and b are considered matching if

  • they are of equal size

  • all of their node elements match

  • and, if available, they reference the same attribute.

The pni::io::nexus::Path type

In C++ a NeXus-path is represented by an instance of pni::io::nexus::Path. pni::io::nexus::Path is an iterable over the elements of the object section of a NeXus-path. The optional file- and attribute-section can be accessed via getter and setter methods like this

nexus::Path path = ...;
path.filename("/data/run/detector.nxs"); //set file section
std::cout<<path.filename()<<std::endl;   //retrieve file section

and analogously for the attribute section

nexus::Path path = ...;
path.attribute("units");              //set attribute section
std::cout<<path.units()<<std::endl;   //retrieve attribute section

The elements of the object section are stored as instances of nexus::Path::Element which is in fact a type alias for a std::pair where the first element of the pair stores the name of the element and the second its class (if available). Technically, nexus::Path is a thin wrapper around a list of such nexus::Path::Element (although not all the list functionality is exported). Consult the API documentation for a detailed description of nexus::Path’s interface.

Path construction

Though the nexus::Path type has a constructor one would typically construct a path from a string using the nexus::Path::from_string() static member method

nexus::Path path = nexus::Path::from_string("/:NXentry/:NXinstrument/pilatus");

nexus::Path::from_string() has also a static counterpart method nexus::Path::to_string() which converts a path instance to its string representation.

nexus::Path path = ....;
std::cout<<nexus::Path::to_string(path)<<std::endl;

Path iteration

nexus::Path provides an STL compliant iterator interface which allows easy iteration over all elements in the object section of the path. Consider the following example

nexus::Path p = nexus::Path::from_string("/:NXentry/:NXinstrument/pilatus/data");

for(auto e:p)
   std::cout<<"name: "<<e.first<<"\t type:"<<e.second<<std::endl;

which would yield the output

name: /       type: NXroot
name:         type: NXentry
name:         type: NXinstrument
name: pilatus type:
name: data    type:

As we can see from the above example: the first member of the nexus::Path::Element stored in the object section list is the name of an object while the second is its type. In the case of a field only the first (name) element will be set (a field does not have a particular type). The number of elements in the object section of nexus::Path can be obtained via the nexus::Path::size() member function (which is the same as for any other STL container).

Push and pop on object

Elements of the object section of the path can be added using the push_back() and push_front() member functions.

nexus::Path p = nexus::Path::from_string(":NXinstrument");
std::cout<<p<<std::endl; // output: :NXinstrument

p.push_back(object_element("","NXdetector"));
std::cout<<p<<std::endl; // output: :NXinstrument/:NXdetector

p.push_front(object_element("","NXentry"));
std::cout<<p<<std::endl; // output: :NXentry/:NXinstrument/:NXdetector

Like other STL containers nexus::Path also provides the front(), back(), pop_front(), and pop_back() member functions which have the standard STL behavior.

nexus::Path p = nexus::Path::from_string(":NXentry/:NXinstrument/:NXdetector");

//get front and back elements from the object section
nexus::Path::Element entry = p.front();
nexus::Path::Element detector = p.back();

std::cout<<p<<std::endl; // output: :NXentry/:NXinstrument/:NXdetector

//remove front and back objects from the object section
p.pop_front();
p.pop_back();

std::cout<<p<<std::endl; // output: :NXinstrument

pni::io::nexus::Path and hdf5::Path

In many cases we may want to construct an HDF5 path from a NeXus path an vica verse. Now, converting from an HDF5 path to a NeXus path is always easy as an HDF5 path is also a valid NeXus path (despite the fact that an HDF5 path cannot address attributes and contains no file information). For this purpose pni::io::nexus::Path has an implicit conversion constructor for an HDF5 path.

hdf5::Path hdf5_path = ...;
pni::io::nexus::Path nexus_path = hdf5_path;

The other direction is also possible but only under certain conditions. Unlike a NeXus path an HDF5 path contains only of link names. So conversion from a NeXus path to an HDF5 path is only possible under the following restrictions

  • the NeXus path has all the link names set in its node section

  • the NeXus path does not reference an attribute (an HDF5 path cannot do that)

  • the NeXus path has an empty file section - we cannot reference a file with an HDF5 path.

pni::io::nexus::Path has an implicit conversion operator to an hdf5::Path. Thus we could use for instance a NeXus path in situations where an HDF5 path is expected

nexus::Path path=nexus::Path::from_string("/entry:NXentry/instrument/detector:NXdetector");

hdf5::node::Group detector = hdf5::node::get_node(root_group,path);

Utility functions

Element utilities

There are a couple of utility functions available to work with the elements stored in the object section of the path. One important function is the object_element() function which creates a single element for the object section of a path. This is particularly useful in connection with the push_back() and push_front() member functions of nexus::Path. If for instance one wants to append a detector group to the object section we could use

nexus::Path p = ...;
p.push_back(object_element("detector","NXdector"));

object_element() takes two arguments: the first is the name of the object while the second its type (only relevant for groups). If both are empty strings and exception will be thrown.

Furthermore there are some functions for querying the basic properties of an element instance. Each of these functions returns a boolean value and takes an instance of nexus::Path::Element as its only argument.

utiltiy function

description

is_root_element()

returns true if the element references the root group with name / and type NXroot

is_complete()

return true if the element has a non-empty name and type

has_name()

return true if the element has a non-empty name

has_class()

return true if the element has a non-empty type

pni::io::nexus::Path utilities

Three inquiry functions exist for nexus::Path. Each of them returns a boolean and takes as their single argument a reference to an instance of nexus::Path

utility function

description

is_absolute()

returns true if the path is an absolute path

has_file_section()

returns true if the path has a non-empty file section

has_attribute_section()

returns true if the path has a non-empty attribute section

is_empty()

returns true if a path has neither a file section, an attribute section, and an object section. This situation would be equivalent to a default constructed path object.

The split_path() function divides an nexus::Path into two partial paths at a user defined position.

std::string s = "test.nxs://:NXentry/:NXinstrument/detector@NX_class";
nexus::Path p = nexus::Path::from_string(s);
nexus::Path instrument_path,detector_path;
split_path(p,3,instrument_path,detector_path);

// output: test.nxs://:NXentry/:NXinstrument
std::cout<<instrument_path<<std::endl;
// output: detector@NX_class
std::cout<<detector_path<<std::endl;

The second argument to split_path() is the position where to perform the split. It is the index of the first element for the second path. To chop of the file section from a path one could use the following code

std::string s = "test.nxs://:NXentry/:NXinstrument/detector@NX_class";
nexus::Path p = nexus::Path::from_string(s);
nexus::Path instrument_path,detector_path;
split_path(p,0,instrument_path,detector_path);

// output: test.nxs
std::cout<<instrument_path<<std::endl;
// output: /:NXentry/:NXinstrument/detector@NX_class
std::cout<<detector_path<<std::endl;

Two paths can be joined using the join() function.

nexus::Path a = nexus::Path::from_string("file.nxs://:NXentry/:NXinstrument");
nexus::Path b = nexus::Path::from_string("pilatus300k:NXdetector/data");
nexus::Path c = join(a,b);
std::cout<<c<<std::endl;

//would output
//file.nxs://:NXentry/:NXinstrument/pilatus300k:NXdetector/data"

There are several restrictions to the two path arguments a and b passed to the join() function

  • a must not have an attribute section

  • b must not have a file section

  • b must not be an absolute path.

If any of these restrictions are violated join() throws value_error. There are additional special conditions which should be taken into account and where the above rules do not apply

input state

result

a empty, b not

return b unchanged

b empty, a not

return a unchanged

a and b empty

return an empty path object

The grammar of a NeXus path

Lets first have a look on the grammar of a Nexus path in EBNFfootnote{EBNF=Extended Backus Naur Form}

file_path        ::=  {all characters allowed by the plattform to describe a path}
valid_char       ::=  "_" | "a-z" | "A-Z" | "0-9";
whitespace       ::=  " " | "\n" | "\r";
class_seperator  ::=  ":";
object_seperator ::=  "/";
nexus_id         ::=  valid_char,{valid_char};
nexus_name       ::=  nexus_id,(class_seperator|group_separtor|whitespace);
nexus_group      ::=  group_seperator,nexus_id,[group_seperator|whitespace];
object_id        ::=  nexus_name | nexus_name,nexus_group | nexus_group
object_path      ::=  ["/"],object_id,{"/",object_id};
nexus_path       ::=  [file_path,"://"],object_path,["@",nexus_attr];

The file_path is platform dependent which makes it difficult to determine which characters would be allowed in a path. Thus we leave this open to and separate the file path from everything else by a :// string terminal. nexus_id describes a repetition of a set of characters allowed in Nexus names (for groups, fields, attributes, and classes). It is much more restrictive as for the filename.