Local Dumps
This section describes how to run a basic dataset dumper workflow.
Running locally#
After having followed the installation instructions and set up the training dataset dumper, you can invoke the executable with
dump-single-btag -c <path to configuration file> <paths to xAOD(s)>
The dumper accepts xAOD root files containing either MC or data.
The configuration files are located in configs/
.
Alongside the output.h5
output file, the userJobMetadata.json
contains information about the run,
including inference timing for taggers.
For more information about the dump-single-btag
command, use the -h
flag.
Keep in mind, when dumping data or MC that should be compared to data,
you should always include the EventPreparator
CA block
in the configuration, to make sure that the data samples are pre-processed correctly and consistently
with the MC samples. Configuration for this case can be found in configs/EMPFlowDataWithRun3CalibrationSelection.json
and configs/EMPFlowMCWithRun3CalibrationSelection.json
Inspecting Outputs#
A list of packages and tools for working with the outputs are given here.
The full job configuration is stored as an attribute in the output h5 files. You can access this in Python by running
# read config as a string from the output file
cfg_str = h5py.File("output.h5").attrs['config"]
# convert string to json object
cfg_json = json.loads(cfg_str)
# pretty print tracks config block
print(json.dumps(cfg_json["tracks"], indent=2))