Output Files#

The TDD outputs Hierarchical Data Format (HDF) files which have a .h5 extension. Each output file can contain multiple datasets, which store data in fixed sized arrays. For example, the jets dataset stores for each dumped jet a 1-d array of variables about the jet (e.g. pt, eta, etc). Meanwhile tracks dataset contains for each jet a 2-d array, with the first index selecting different tracks in the jet (up to 40 tracks are stored by default), and the second index selecting a different track variable (e.g. numberOfPixelHits, etc).

h5 datasets must have a fixed shape, so jets with fewer than 40 tracks are padded with null tracks, where the default values of track variables are used. The different datasets and variables present in your output files depends on the configuration of the jobs as discussed elsewhere.

Useful Tools#

There are a few commonly used tools for working with the output .h5 files.

Tool	Description
Umami	The main Python framework used for further processing of output `h5` files and algorithm training
Puma	The main FTAG plotting framework, also used by umami
`h5ls`	Lists the contents of an `h5` file (as mentioned in the installation section)
`h5diff`	Highlights differences between two `h5` files - useful for validating a new output against some reference
`h5py`	A Python package for working with `h5` files, used by downstream packages like umami
`tdd-scripts`	A few Python functions for reading the tdd `h5` outputs
`h5-batched-read`	Python functions for reading in batches from `h5` files at full precision

Default Values#

Jet level defaults are specified as arguments to add_btag_fillers() calls in the BTagJetWriterUtils.cxx.

Variable type	Default Value
Char	-1
Int	-1
Float	NaN

Track level defaults are specified as arguments to add_track_fillers() calls in the BTagTrackWriter.cxx. To select non-padded tracks, the valid flag can be used, which is True when the track is present.

Variable type	Default Value
Unsigned Char	0
Int	-1
Float	NaN
Bool	False