Configuration

Jobs are steered via json files that live under configs/. Some fields are optional, but default values are always trivial: all sequences are empty by default and all numbers are zero. In most cases providing an unknown field should throw an exception.

Fragments#

Configuration files contain with one special key: file. If this appears in a json object, the parser will assume the key gives a path relative to the current file. Any entries in the file will be imported at the same scope where the file key appears. If local keys conflict with imported ones they will be merged, giving local keys precedence over those imported from the file.

Truth and overlap removal#

There are many conflicting definitions for what might be considered a $b$ -jet or a lepton, which can make defining an optimal tagger or removing leptons from the training a bit subtle. To embrace this chaos, you can specify any number of truth particle collections to handle in the truths list. Truth particles can be used to veto jets, to decorate jets with extra jet-wise information, or they can be written out to the h5 file.

All truth particles will be associated to jets as xAOD::IParticles. This is for compatibility with xAODs: a number of other IParticle collections are associated upstream, e.g. when jets are built. The truth handling is shown in the diagram below

graph TD;
subgraph sources ["Inputs (Choose one or none)"]
  sel(association) & merge(merge)
end
sel & merge  --> assoc(association_name)
subgraph outputs [" "]
or(overlap_dr)
writer(output)
dec(decorate_summary)
end
assoc --> or & dec & writer

Each block above is also the name of a field in the truth block. One or none of the first two options can be used:

association: Runs $\Delta R$ particle association if specified. It can be further configured:
- container: The name of the TruthParticleContainer to read particles from. Defaults to TruthParticles if omitted. Note: it is possible to instead use the containers key to specify a list of input truth containers.
- particles: The type of particles to select, e.g. fromBC or promptLepton.
- pt_minimum, abs_eta_maximum: should be self-explanatory.
- dr_maximum: maximum $\Delta R$ between jet and an associated particle.
association block example
```
"association": {
  "particles": "overlapLepton",
  "pt_minimum": 2e3,
  "abs_eta_maximum": 2.5,
  "dr_maximum": 0.4
}
```
merge: A list of previously associated particles to merge into this one.
merge block example
```
"merge": [
  "ConeExclCHadronsFinal",
  "ConeExclBHadronsFinal"
]
```

If neither of the above options is given, we assume the particles are already associated to the jet in your input xAOD. What happens with them can be further configured:

association_name: It specifies the name of the std::vector<ElementLink<xAOD::IParticleContainer>> auxiliary data that is saved on the jet. If not specified we'll use particles from the association block. Example: "association_name": "ConeExclHadrons".
overlap_dr: The minimum $\Delta R(\text{jet},x)$ , where $x$ is any particle in this collection. Jets are vetoed if closer particles are found. Example: "overlap_dr": 0.3. This key is used with the overlapLeton particle type to veto jets containing leptons from W or Z decays.
output: Writes an array of particles to the h5 output file. Has several fields:
- n_to_save: size of the output array
- sort_order: ordering of the output particles
- name: name of the output array, if not specified we'll use association_name
output block example
```
"output": {
  "n_to_save": 5,
  "sort_order": "pt",
  "name": "hadrons"
}
```
decorate_summary: Enable summary decorations on the jet. This will add n_truth_<name> and min_dr_truth_<name> decorations to the jet, which can be written out as jet level variables.

DL2 Config#

Multifold taggers can be configured with the MultifoldTagger block.

The DL2 config is the configuration of taggers you want to apply when dumping the ntuples. Some taggers are already applied on the derivation level, but newer versions or freshly developed models can be added via this configuration. The dl2_config is a list of dicts with the different taggers you want to add inside. This can look for example like this:

"dl2_configs": [
    {
        "nn_file_path": "<path>/<to>/<your>/<local>/<network.json>"
    },
    {
        "nn_file_path": "dev/BTagging/20210506r22/umami/antikt4empflow/network.json",
        "remapping": {
            "UMAMI20210506r22_pu": "Umami_r22_pu",
            "UMAMI20210506r22_pc": "Umami_r22_pc",
            "UMAMI20210506r22_pb": "Umami_r22_pb",
            "dips_UMAMI20210506r22_pu": "dips_from_Umami_r22_pu",
            "dips_UMAMI20210506r22_pc": "dips_from_Umami_r22_pc",
            "dips_UMAMI20210506r22_pb": "dips_from_Umami_r22_pb"
        }
    }
]

The first dict here has two entries: The nn_file_path and remapping. The nn_file_path is the path to your model. When you are running locally, you can just add the path to the network.json. Keep in mind this needs to be a converted LWTNN model of the network! For the second option, an empty dict is provided. If this is the case, the output variables of the network are based on the LWTNN entries when the model was created.

For the second dict, a specific path is given here. This is a path to a model which is available in the ATLAS FTAG (dev) group area. A list of the available taggers can be found here. For that, a PathResolver is used to find the correct path to the model files. The PathResolver looks for the model files in the following order:

The local directory where you start the job
/cvmfs/
On https://atlas-groupdata.web.cern.ch/atlas-groupdata/ (this is how it works from the docker images)

When you want to run on the grid, this is one way to use a model that is not in the derivations/xAODs. You just need to provide the path inside the group area, not the full group area absolute path. The path to the group area (to check which models are available) is /cvmfs/atlas.cern.ch/repo/sw/database/GroupData/. If you want to add a tagger to the group area, you can request that your file is added there. An explanation how to make a request can be found in the ASGCalibArea twiki. If you want to add a new tagger from FTAG Algorithms, please contact the FTAG Algorithms conveners via atlas-ftag-algorithms-conveners@cern.ch email and ask them to do it for you.

The second option, remapping, is now a dict with entries. If you want to rename the output of one network to a different name, you can do this here. The dict entry key is the current name of the variable and the entry itself is the new name of the variable in the dumped .h5 files.

Using ONNX#

Some networks aren't supported with the default engine, lwtnn. In these cases, you should change the engine to ONNX runtime, by specifying

"engine": "gnn"

at the same level as the nn_file_path and remapping. You should be able to tell which networks these are based on the .onnx extension in the NN path. To explicitly ask for lwtnn, you can set the engine to dl2.

Decorating jets directly#

You don't always have to add b-tagging information to the BTagging object, especially when you're running a simpler tagger like dips or the track-based GNN. Instead, you can add the output scores directly to the jet, by specifying the where field, e.g. "where": "jet". Valid options are jet and btag.

Running flipped taggers#

By default, taggers run in the STANDARD flip tag config (i.e. they run without any flipping). You can change this by specifying the flip_tag_config key as NEGATIVE_IP_ONLY, FLIP_SIGN, or SIMPLE_FLIP or for example by specifying

"flip_tag_config": "SIMPLE_FLIP"

at the same level as the nn_file_path and remapping.

SIMPLE_FLIP is the default flipping strategy for GN2 and GN3.

Dumping Constituents#

The top level flow: object can be used to dump particle flow constituents and the associated track particles and clusters. Most of the fields are identical to the track configuration.

The particle flow configuration also has an associations block to dump information on associated track and cluster objects. If specified, the track and cluster configurations can contain:

A variables block which is identical to the variables block for tracks and the flow objects.
An edm_names block which maps the specified name in the configuration (and in the output file) to the variable name in the xAOD. This is needed if you want to write e.g. track_pt in addition to the particle flow constituent pt. Specifying "edm_names": {"track_pt":"pt"} will indicate that the pt accessor in the xAOD should be used, since using the same name twice will cause a clash in the output file.

The "dumper" key for trigger and multi-config#

Trigger dumps are using the CA way. In this case, we can stack in the CA other algorithms that run before, after or alongside with the dumper. Therefore, the dumper configuration needs to be encapsulated inside the "dumper" group. And we can add specific settings of other algorithms outside that, for example:

"external_setting_1": true,
"dumper": {
    "jet_collection": "tempEmtopoJets",
    ...
}

All the available options for trigger jobs are defined in trigger.py under the getJobConfig method.

Similarly, multiple dumps can run simultaneously. The configuration file, then, has to contain the different versions of the settings to be used for each instance of the dumper. For example:

"pflow": {
    "dumper": {
        "file": "EMPFlow.json"
    }
},
"vrtrack": {
    "dumper": {
        "file": "TrackJets.json"
    }
}

N.B. for multi-config case it is required to add the "dumper" group only if the configuration file being imported does not contain that group alredy, i.e. for trigger combos it should not be required.

Other configuration notes#

Both the top level configuration and the track dumpers allow a btagging_link to be specified. This is the name of the link from jets to the BTagging object, and has historically been btaggingLink in most ATLAS code. This field can be omitted or left as an empty string:
- In this case the tracks, the tracks will be read directly from the jet.
- In the top level, a missing link means there is no associated b-tagging object. This means any attempts to write out information on the BTagging object will fail!

Configuring CA blocks#

Full information about available blocks is found here.