Configuration
Jobs are steered via json files that live under configs/
. Some fields are optional, but default values are always trivial: all sequences are empty by default and all numbers are zero.
In most cases providing an unknown field should throw an exception.
Fragments#
Configuration files contain with one special key: file
.
If this appears in a json object, the parser will assume the key
gives a path relative to the current file. Any entries in the file
will be imported at the same scope where the file
key appears.
If local keys conflict with imported ones they will be merged, giving
local keys precedence over those imported from the file.
More on merging
Conflicting keys are merged as follows:
- If the keys point objects (i.e. a
dict
ormap
), the union of all keys and values within the objects is taken, and conflicting keys are merged recursively. - If the keys point lists, the lists are concatenated.
- If the keys point to anything else (i.e. numbers or strings), the local key overwrites the key from the file.
Fragments are a useful way to specify default values, since any values
imported via file
will be overwritten by local ones.
You can run test-config-merge path/to/config.json
to get the fully merged
configuration. Comparing two merged configs (for example before and after making
changed to a configuration file) can be achieved with
diff <(test-config-merge file1.json) <(test-config-merge file2.json)
Truth and overlap removal#
There are many conflicting definitions for what might be considered a
b-jet or a lepton, which can make defining an optimal tagger or
removing leptons from the training a bit subtle. To embrace this
chaos, you can specify any number of truth particle collections to
handle in the truths
list. Truth particles can be used to veto jets,
to decorate jets with extra jet-wise information, or they can be
written out to the h5 file.
All truth particles will be associated to jets as
xAOD::IParticle
s. This is for compatibility with xAODs: a number of
other IParticle
collections are associated upstream, e.g. when jets
are built. The truth handling is shown in the diagram below
graph TD;
subgraph sources ["Inputs (Choose one or none)"]
sel(association) & merge(merge)
end
sel & merge --> assoc(association_name)
subgraph outputs [" "]
or(overlap_dr)
writer(output)
dec(decorate_summary)
end
assoc --> or & dec & writer
Each block above is also the name of a field in the truth
block. One
or none of the first two options can be used:
-
association
: Runs \Delta R particle association if specified. It can be further configured:container
: The name of theTruthParticleContainer
to read particles from. Defaults toTruthParticles
if omitted. Note: it is possible to instead use thecontainers
key to specify a list of input truth containers.particles
: The type of particles to select, e.g.fromBC
orpromptLepton
.pt_minimum
,abs_eta_maximum
: should be self-explanatory.dr_maximum
: maximum \Delta R between jet and an associated particle.
association
block example"association": { "particles": "overlapLepton", "pt_minimum": 2e3, "abs_eta_maximum": 2.5, "dr_maximum": 0.4 }
-
merge
: A list of previously associated particles to merge into this one.merge
block example"merge": [ "ConeExclCHadronsFinal", "ConeExclBHadronsFinal" ]
If neither of the above options is given, we assume the particles are already associated to the jet in your input xAOD. What happens with them can be further configured:
association_name
: It specifies the name of thestd::vector<ElementLink<xAOD::IParticleContainer>>
auxiliary data that is saved on the jet. If not specified we'll useparticles
from theassociation
block. Example:"association_name": "ConeExclHadrons"
.overlap_dr
: The minimum \Delta R(\text{jet},x), where x is any particle in this collection. Jets are vetoed if closer particles are found. Example:"overlap_dr": 0.3
. This key is used with theoverlapLeton
particle type to veto jets containing leptons from W or Z decays.-
output
: Writes an array of particles to the h5 output file. Has several fields:n_to_save
: size of the output arraysort_order
: ordering of the output particlesname
: name of the output array, if not specified we'll useassociation_name
output
block example"output": { "n_to_save": 5, "sort_order": "pt", "name": "hadrons" }
-
decorate_summary
: Enable summary decorations on the jet. This will addn_truth_<name>
andmin_dr_truth_<name>
decorations to the jet, which can be written out as jet level variables.
Running new Taggers#
Some taggers are already applied on the derivation level, but newer versions or freshly developed models can be added via a CA block called MultifoldTagger. It takes as input a .onnx
model exported from Salt
and directly evalutes and decorates the jets with the probabilities and other auxillary outputs.
This can look like this:
{
"block": "MultifoldTagger",
"nn_paths": [
"dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold0.onnx",
"dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold1.onnx",
"dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold2.onnx",
"dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold3.onnx"
],
"target": "jet"
}
here, we want to add the not-deployed GN2v01
version with lepton inputs. As you might have noticed, we have actually four models in here. This is due to the k-folding strategy that was used for the GN2v*
models. To apply only one model, just define the block like this:
{
"block": "MultifoldTagger",
"nn_paths": [
"dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold0.onnx"
],
"target": "jet"
}
Now one model is applied to all jets, not only the k-folding part. The target
here defines to which object the values are decorated. By default, this is the jet
object. In previous iterations, we decorated the BTagging
object but this is deprecated.
It is also possible to schedule flipped versions of the taggers:
{
"block": "MultifoldTagger",
"flip_tag_config": "SIMPLE_FLIP",
"alg_name": "flip_tagger",
"nn_paths": [
"dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold0.onnx",
"dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold1.onnx",
"dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold2.onnx",
"dev/BTagging/20240627/GN2v01Lep/antikt4empflow/network_fold3.onnx"
],
"target": "jet"
}
here we can see the same tagger but scheduled as a flip using the SIMPLE_FLIP
strategy. The alg_name
is added to distingish it from the other MultifoldTagger
instances. Athena doesn't like two algorithms with the same name.
You might have noticed also that the paths of the taggers aren't absolute paths and that these models are not stored locally on your disk. The paths given here are paths to models which are available in the ATLAS FTAG (dev) group area. A list of the available taggers can be found here. For that, a PathResolver
is used to find the correct path to the model files. The PathResolver
looks for the model files in the following order:
- The local directory where you start the job
/cvmfs/
- On https://atlas-groupdata.web.cern.ch/atlas-groupdata/ (this is how it works from the docker images)
When you want to run on the grid, this is one way to use models that are not in the derivations/xAODs. You just need to provide the path inside the group area, not the full group area absolute path. The path to the group area (to check which models are available) is /cvmfs/atlas.cern.ch/repo/sw/database/GroupData/
.
If you want to add a tagger to the group area, you can request that your file is added there. An explanation how to make a request can be found in the FTAG Docs. If you want to add a new tagger from FTAG Algorithms, please contact the FTAG Algorithms conveners via atlas-ftag-algorithms-conveners@cern.ch
email and ask them to do it for you.
Now to actually store your new outputs from the networks also in the h5 output files, you need to add the variables to the list of output variables:
"variables": {
"file": "fragments/pflow-variables-sv1-jf.json",
"jet": {
"floats": [
"GN2Lep_pb",
"GN2Lep_pc",
"GN2Lep_pu",
"GN2Lep_ptau",
"GN2LepSimpleFlip_pb",
"GN2LepSimpleFlip_pc",
"GN2LepSimpleFlip_pu",
"GN2LepSimpleFlip_ptau"
]
}
}
In the example here, we have scheduled both blocks, the standard one and the flipped one. If you now run the dumper using that config, you should get new output values.
Dumping Constituents#
The top level flow:
object can be used to dump particle flow
constituents and the associated track particles and clusters. Most of
the fields are identical to the track configuration.
The particle flow configuration also has an associations
block to
dump information on associated track
and cluster
objects. If
specified, the track
and cluster
configurations can contain:
- A
variables
block which is identical to the variables block for tracks and the flow objects. - An
edm_names
block which maps the specified name in the configuration (and in the output file) to the variable name in the xAOD. This is needed if you want to write e.g.track_pt
in addition to the particle flow constituentpt
. Specifying"edm_names": {"track_pt":"pt"}
will indicate that thept
accessor in the xAOD should be used, since using the same name twice will cause a clash in the output file.
The "dumper" key for trigger and multi-config#
Trigger dumps are using the CA way. In this case, we can stack in the CA
other algorithms that run before, after or alongside with the dumper.
Therefore, the dumper configuration needs to be encapsulated inside the "dumper"
group.
And we can add specific settings of other algorithms outside that, for example:
"external_setting_1": true,
"dumper": {
"jet_collection": "tempEmtopoJets",
...
}
trigger.py
under the getJobConfig
method.
Similarly, multiple dumps can run simultaneously. The configuration file, then, has to contain the different versions of the settings to be used for each instance of the dumper. For example:
"pflow": {
"dumper": {
"file": "EMPFlow.json"
}
},
"vrtrack": {
"dumper": {
"file": "TrackJets.json"
}
}
"dumper"
group only if the configuration file
being imported does not contain that group alredy, i.e. for trigger combos it should not be required.
Other configuration notes#
- Both the top level configuration and the track dumpers allow a
btagging_link
to be specified. This is the name of the link from jets to theBTagging
object, and has historically beenbtaggingLink
in most ATLAS code. This field can be omitted or left as an empty string:- In this case the tracks, the tracks will be read directly from the jet.
- In the top level, a missing link means there is no associated
b-tagging object. This means any attempts to write out
information on the
BTagging
object will fail!
Configuring CA blocks#
Full information about available blocks is found here.