Skip to content

Output Files#

The TDD outputs Hierarchical Data Format (HDF) files which have a .h5 extension. Each output file can contain multiple datasets, which store data in fixed sized arrays. For example, the jets dataset stores for each dumped jet a 1-d array of variables about the jet (e.g. pt, eta, etc). Meanwhile tracks dataset contains for each jet a 2-d array, with the first index selecting different tracks in the jet (up to 40 tracks are stored by default), and the second index selecting a different track variable (e.g. numberOfPixelHits, etc).

h5 datasets must have a fixed shape, so jets with fewer than 40 tracks are padded with null tracks, where the default values of track variables are used. The different datasets and variables present in your output files depends on the configuration of the jobs as discussed elsewhere.

Useful Tools#

There are a few commonly used tools for working with the output .h5 files.

Tool Description
Umami The main Python framework used for further processing of output h5 files and algorithm training
Puma The main FTAG plotting framework, also used by umami
h5ls Lists the contents of an h5 file (as mentioned in the installation section)
h5diff Highlights differences between two h5 files - useful for validating a new output against some reference
h5py A Python package for working with h5 files, used by downstream packages like umami
tdd-scripts A few Python functions for reading the tdd h5 outputs
h5-batched-read Python functions for reading in batches from h5 files at full precision

Default Values#

Jet level defaults are specified as arguments to add_btag_fillers() calls in the BTagJetWriterUtils.cxx.

Variable type Default Value
Char -1
Int -1
Float NaN

Track level defaults are specified as arguments to add_track_fillers() calls in the BTagTrackWriter.cxx. To select non-padded tracks, the valid flag can be used, which is True when the track is present.

Variable type Default Value
Unsigned Char 0
Int -1
Float NaN
Bool False