Skip to content

Local Dumps

This section describes how to run a basic dataset dumper workflow.

Running locally#

After having followed the installation instructions and set up the training dataset dumper, you can invoke the executable with

dump-single-btag -c <path to configuration file> <paths to xAOD(s)>

The dumper accepts xAOD root files containing either MC or data. The configuration files are located in configs/. Alongside the output.h5 output file, the userJobMetadata.json contains information about the run, including inference timing for taggers. For more information about the dump-single-btag command, use the -h flag.

Keep in mind, when dumping data or MC that should be compared to data, you should always include the EventPreparator CA block in the configuration, to make sure that the data samples are pre-processed correctly and consistently with the MC samples. Configuration for this case can be found in configs/EMPFlowDataWithRun3CalibrationSelection.json and configs/EMPFlowMCWithRun3CalibrationSelection.json

Inspecting Outputs#

A list of packages and tools for working with the outputs are given here.

The full job configuration is stored as an attribute in the output h5 files. You can access this in Python by running

# read config as a string from the output file
cfg_str = h5py.File("output.h5").attrs['config"]

# convert string to json object
cfg_json = json.loads(cfg_str)

# pretty print tracks config block
print(json.dumps(cfg_json["tracks"], indent=2))