Configuration

Jobs are steered via json files that live under configs/. Some fields are optional, but default values are always trivial: all sequences are empty by default and all numbers are zero. In most cases providing an unknown field should throw an exception.

Fragments#

Configuration files contain with one special key: files. This should always be a list of file paths, e.g. "files" : ["fragments/pflow-base.json"] If this appears in a json object, the parser will assume the key gives a path relative to the current file. Any entries in the file will be imported at the same scope where the files key appears. If local keys conflict with imported ones they will be merged, giving local keys precedence over those imported from the file.

Truth and overlap removal#

There are many conflicting definitions for what might be considered a $b$ -jet or a lepton, which can make defining an optimal tagger or removing leptons from the training a bit subtle. To embrace this chaos, you can specify any number of truth particle collections to handle in the truths list. Truth particles can be used to veto jets, to decorate jets with extra jet-wise information, or they can be written out to the h5 file.

All truth particles will be associated to jets as xAOD::IParticles. This is for compatibility with xAODs: a number of other IParticle collections are associated upstream, e.g. when jets are built. The truth handling is shown in the diagram below

graph TD;
subgraph sources ["Inputs (Choose one or none)"]
  sel(association) & merge(merge)
end
sel & merge  --> assoc(association_name)
subgraph outputs [" "]
or(overlap_dr)
writer(output)
dec(decorate_summary)
end
assoc --> or & dec & writer

Each block above is also the name of a field in the truth block. One or none of the first two options can be used:

association: Runs $\Delta R$ particle association if specified. It can be further configured:
- container: The name of the TruthParticleContainer to read particles from. Defaults to TruthParticles if omitted. Note: it is possible to instead use the containers key to specify a list of input truth containers.
- particles: The type of particles to select, e.g. fromBC or promptLepton.
- pt_minimum, abs_eta_maximum: should be self-explanatory.
- dr_maximum: maximum $\Delta R$ between jet and an associated particle.
association block example
```
"association": {
  "particles": "overlapLepton",
  "pt_minimum": 2e3,
  "abs_eta_maximum": 2.5,
  "dr_maximum": 0.4
}
```
merge: A list of previously associated particles to merge into this one.
merge block example
```
"merge": [
  "ConeExclCHadronsFinal",
  "ConeExclBHadronsFinal"
]
```

If neither of the above options is given, we assume the particles are already associated to the jet in your input xAOD. What happens with them can be further configured:

association_name: It specifies the name of the std::vector<ElementLink<xAOD::IParticleContainer>> auxiliary data that is saved on the jet. If not specified we'll use particles from the association block. Example: "association_name": "ConeExclHadrons".
overlap_dr: The minimum $\Delta R(\text{jet},x)$ , where $x$ is any particle in this collection. Jets are vetoed if closer particles are found. Example: "overlap_dr": 0.3. This key is used with the overlapLeton particle type to veto jets containing leptons from W or Z decays.
output: Writes an array of particles to the h5 output file. Has several fields:
- n_to_save: size of the output array
- sort_order: ordering of the output particles
- name: name of the output array, if not specified we'll use association_name
output block example
```
"output": {
  "n_to_save": 5,
  "sort_order": "pt",
  "name": "hadrons"
}
```
decorate_summary: Enable summary decorations on the jet. This will add n_truth_<name> and min_dr_truth_<name> decorations to the jet, which can be written out as jet level variables.

Running new Taggers#

Some taggers are already applied on the derivation level, but newer versions or freshly developed models can be added via a CA block called MultifoldTagger. It takes as input a .onnx model exported from Salt and directly evalutes and decorates the jets with the probabilities and other auxillary outputs.

This can look like this:

{
    "block": "MultifoldTagger",
    "nn_paths": [
        "dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold0.onnx",
        "dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold1.onnx",
        "dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold2.onnx",
        "dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold3.onnx"
    ],
    "target": "jet"
}

here, we want to add the not-deployed GN2v01 version with lepton inputs. As you might have noticed, we have actually four models in here. This is due to the k-folding strategy that was used for the GN2v* models. To apply only one model, just define the block like this:

{
    "block": "MultifoldTagger",
    "nn_paths": [
        "dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold0.onnx"
    ],
    "target": "jet"
}

Now one model is applied to all jets, not only the k-folding part. The target here defines to which object the values are decorated. By default, this is the jet object. In previous iterations, we decorated the BTagging object but this is deprecated.

It is also possible to schedule flipped versions of the taggers:

{
    "block": "MultifoldTagger",
    "flip_tag_config": "SIMPLE_FLIP",
    "alg_name": "flip_tagger",
    "nn_paths": [
        "dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold0.onnx",
        "dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold1.onnx",
        "dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold2.onnx",
        "dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold3.onnx"
    ],
    "target": "jet"
}

here we can see the same tagger but scheduled as a flip using the SIMPLE_FLIP strategy. The alg_name is added to distingish it from the other MultifoldTagger instances. Athena doesn't like two algorithms with the same name.

You might have noticed also that the paths of the taggers aren't absolute paths and that these models are not stored locally on your disk. The paths given here are paths to models which are available in the ATLAS FTAG (dev) group area. A list of the available taggers can be found here. For that, a PathResolver is used to find the correct path to the model files. The PathResolver looks for the model files in the following order:

The local directory where you start the job
/cvmfs/
On https://atlas-groupdata.web.cern.ch/atlas-groupdata/ (this is how it works from the docker images)

When you want to run on the grid, this is one way to use models that are not in the derivations/xAODs. You just need to provide the path inside the group area, not the full group area absolute path. The path to the group area (to check which models are available) is /cvmfs/atlas.cern.ch/repo/sw/database/GroupData/. If you want to add a tagger to the group area, you can request that your file is added there. An explanation how to make a request can be found in the FTAG Docs. If you want to add a new tagger from FTAG Algorithms, please contact the FTAG Algorithms conveners via atlas-ftag-algorithms-conveners@cern.ch email and ask them to do it for you.

Now to actually store your new outputs from the networks also in the h5 output files, you need to add the variables to the list of output variables:

"variables": {
    "file": "fragments/pflow-variables-sv1-jf.json",
    "jet": {
        "floats": [
            "GN2Lep_pb",
            "GN2Lep_pc",
            "GN2Lep_pu",
            "GN2Lep_ptau",
            "GN2LepSimpleFlip_pb",
            "GN2LepSimpleFlip_pc",
            "GN2LepSimpleFlip_pu",
            "GN2LepSimpleFlip_ptau"
        ]
    }
}

In the example here, we have scheduled both blocks, the standard one and the flipped one. If you now run the dumper using that config, you should get new output values.

Dumping Constituents#

The top level flow: object can be used to dump particle flow constituents and the associated track particles and clusters. Most of the fields are identical to the track configuration.

The particle flow configuration also has an associations block to dump information on associated track and cluster objects. If specified, the track and cluster configurations can contain:

A variables block which is identical to the variables block for tracks and the flow objects.
An edm_names block which maps the specified name in the configuration (and in the output file) to the variable name in the xAOD. This is needed if you want to write e.g. track_pt in addition to the particle flow constituent pt. Specifying "edm_names": {"track_pt":"pt"} will indicate that the pt accessor in the xAOD should be used, since using the same name twice will cause a clash in the output file.

The "dumper" key for trigger and multi-config#

Trigger dumps are using the CA way. In this case, we can stack in the CA other algorithms that run before, after or alongside with the dumper. Therefore, the dumper configuration needs to be encapsulated inside the "dumper" group. And we can add specific settings of other algorithms outside that, for example:

"external_setting_1": true,
"dumper": {
    "jet_collection": "tempEmtopoJets",
    ...
}

All the available options for trigger jobs are defined in trigger.py under the getJobConfig method.

Similarly, multiple dumps can run simultaneously. The configuration file, then, has to contain the different versions of the settings to be used for each instance of the dumper. For example:

"pflow": {
    "dumper": {
        "file": "EMPFlow.json"
    }
},
"vrtrack": {
    "dumper": {
        "file": "TrackJets.json"
    }
}

N.B. for multi-config case it is required to add the "dumper" group only if the configuration file being imported does not contain that group alredy, i.e. for trigger combos it should not be required.

Metadata#

By default, the dumper will attempt to find meta data associated with a given sample, for example the sum of weights. Some samples don't have this information, see the blacklist here. If running over an invalid sample, or if you get errors like could not retrieve CutBookkeepers you can run the dumper with --no-meta. You can also add file_metadata: false as a top level option in your dumper config.

Other configuration notes#

Both the top level configuration and the track dumpers allow a btagging_link to be specified. This is the name of the link from jets to the BTagging object, and has historically been btaggingLink in most ATLAS code. This field can be omitted or left as an empty string:
- In this case the tracks, the tracks will be read directly from the jet.
- In the top level, a missing link means there is no associated b-tagging object. This means any attempts to write out information on the BTagging object will fail!

Configuring CA blocks#

Full information about available blocks is found here.