Configuration
Jobs are steered via json files that live under configs/
. Some fields are optional, but default values are always trivial: all sequences are empty by default and all numbers are zero.
In most cases providing an unknown field should throw an exception.
Fragments#
Configuration files contain with one special key: file
.
If this appears in a json object, the parser will assume the key
gives a path relative to the current file. Any entries in the file
will be imported at the same scope where the file
key appears.
If local keys conflict with imported ones they will be merged, giving
local keys precedence over those imported from the file.
More on merging
Conflicting keys are merged as follows:
- If the keys point objects (i.e. a
dict
ormap
), the union of all keys and values within the objects is taken, and conflicting keys are merged recursively. - If the keys point lists, the lists are concatenated.
- If the keys point to anything else (i.e. numbers or strings), the local key overwrites the key from the file.
Fragments are a useful way to specify default values, since any values
imported via file
will be overwritten by local ones.
You can run test-config-merge path/to/config.json
to get the fully merged
configuration. Comparing two merged configs (for example before and after making
changed to a configuration file) can be achieved with
diff <(test-config-merge file1.json) <(test-config-merge file2.json)
Truth and overlap removal#
There are many conflicting definitions for what might be considered a
b-jet or a lepton, which can make defining an optimal tagger or
removing leptons from the training a bit subtle. To embrace this
chaos, you can specify any number of truth particle collections to
handle in the truths
list. Truth particles can be used to veto jets,
to decorate jets with extra jet-wise information, or they can be
written out to the h5 file.
All truth particles will be associated to jets as
xAOD::IParticle
s. This is for compatibility with xAODs: a number of
other IParticle
collections are associated upstream, e.g. when jets
are built. The truth handling is shown in the diagram below
graph TD;
subgraph sources ["Inputs (Choose one or none)"]
sel(association) & merge(merge)
end
sel & merge --> assoc(association_name)
subgraph outputs [" "]
or(overlap_dr)
writer(output)
dec(decorate_summary)
end
assoc --> or & dec & writer
Each block above is also the name of a field in the truth
block. One
or none of the first two options can be used:
-
association
: Runs \Delta R particle association if specified. It can be further configured:container
: The name of theTruthParticleContainer
to read particles from. Defaults toTruthParticles
if omitted. Note: it is possible to instead use thecontainers
key to specify a list of input truth containers.particles
: The type of particles to select, e.g.fromBC
orpromptLepton
.pt_minimum
,abs_eta_maximum
: should be self-explanatory.dr_maximum
: maximum \Delta R between jet and an associated particle.
association
block example"association": { "particles": "overlapLepton", "pt_minimum": 2e3, "abs_eta_maximum": 2.5, "dr_maximum": 0.4 }
-
merge
: A list of previously associated particles to merge into this one.merge
block example"merge": [ "ConeExclCHadronsFinal", "ConeExclBHadronsFinal" ]
If neither of the above options is given, we assume the particles are already associated to the jet in your input xAOD. What happens with them can be further configured:
association_name
: It specifies the name of thestd::vector<ElementLink<xAOD::IParticleContainer>>
auxiliary data that is saved on the jet. If not specified we'll useparticles
from theassociation
block. Example:"association_name": "ConeExclHadrons"
.overlap_dr
: The minimum \Delta R(\text{jet},x), where x is any particle in this collection. Jets are vetoed if closer particles are found. Example:"overlap_dr": 0.3
. This key is used with theoverlapLeton
particle type to veto jets containing leptons from W or Z decays.-
output
: Writes an array of particles to the h5 output file. Has several fields:n_to_save
: size of the output arraysort_order
: ordering of the output particlesname
: name of the output array, if not specified we'll useassociation_name
output
block example"output": { "n_to_save": 5, "sort_order": "pt", "name": "hadrons" }
-
decorate_summary
: Enable summary decorations on the jet. This will addn_truth_<name>
andmin_dr_truth_<name>
decorations to the jet, which can be written out as jet level variables.
DL2 Config#
Multifold taggers can be configured with the MultifoldTagger block.
The DL2 config is the configuration of taggers you want to apply when dumping the ntuples. Some taggers are already applied on the derivation level, but newer versions or freshly developed models can be added via this configuration. The dl2_config
is a list of dicts with the different taggers you want to add inside. This can look for example like this:
"dl2_configs": [
{
"nn_file_path": "<path>/<to>/<your>/<local>/<network.json>"
},
{
"nn_file_path": "dev/BTagging/20210506r22/umami/antikt4empflow/network.json",
"remapping": {
"UMAMI20210506r22_pu": "Umami_r22_pu",
"UMAMI20210506r22_pc": "Umami_r22_pc",
"UMAMI20210506r22_pb": "Umami_r22_pb",
"dips_UMAMI20210506r22_pu": "dips_from_Umami_r22_pu",
"dips_UMAMI20210506r22_pc": "dips_from_Umami_r22_pc",
"dips_UMAMI20210506r22_pb": "dips_from_Umami_r22_pb"
}
}
]
The first dict here has two entries: The nn_file_path
and remapping
. The nn_file_path
is the path to your model. When you are running locally, you can just add the path to the network.json
. Keep in mind this needs to be a converted LWTNN model of the network! For the second option, an empty dict is provided. If this is the case, the output variables of the network are based on the LWTNN entries when the model was created.
For the second dict, a specific path is given here. This is a path to a model which is available in the ATLAS FTAG (dev) group area. A list of the available taggers can be found here. For that, a PathResolver
is used to find the correct path to the model files. The PathResolver
looks for the model files in the following order:
- The local directory where you start the job
/cvmfs/
- On https://atlas-groupdata.web.cern.ch/atlas-groupdata/ (this is how it works from the docker images)
When you want to run on the grid, this is one way to use a model that is not in the derivations/xAODs. You just need to provide the path inside the group area, not the full group area absolute path. The path to the group area (to check which models are available) is /cvmfs/atlas.cern.ch/repo/sw/database/GroupData/
.
If you want to add a tagger to the group area, you can request that your file is added there. An explanation how to make a request can be found in the ASGCalibArea
twiki. If you want to add a new tagger from FTAG Algorithms, please contact the FTAG Algorithms conveners via atlas-ftag-algorithms-conveners@cern.ch
email and ask them to do it for you.
The second option, remapping
, is now a dict with entries. If you want to rename the output of one network to a different name, you can do this here. The dict entry key is the current name of the variable and the entry itself is the new name of the variable in the dumped .h5 files.
Using ONNX#
Some networks aren't supported with the default engine, lwtnn
. In
these cases, you should change the engine
to ONNX runtime, by
specifying
"engine": "gnn"
nn_file_path
and remapping
. You should be
able to tell which networks these are based on the .onnx
extension
in the NN path. To explicitly ask for lwtnn
, you can set the engine
to dl2
.
Decorating jets directly#
You don't always have to add b-tagging information to the BTagging
object, especially when you're running a simpler tagger like dips or
the track-based GNN. Instead, you can add the output scores directly
to the jet, by specifying the where
field, e.g. "where":
"jet"
. Valid options are jet
and btag
.
Running flipped taggers#
By default, taggers run in the STANDARD
flip tag config (i.e. they run
without any flipping). You can change this by specifying the flip_tag_config
key as NEGATIVE_IP_ONLY
or FLIP_SIGN
, for example by specifying
"flip_tag_config": "NEGATIVE_IP_ONLY"
nn_file_path
and remapping
.
Dumping Constituents#
The top level flow:
object can be used to dump particle flow
constituents and the associated track particles and clusters. Most of
the fields are identical to the track configuration.
The particle flow configuration also has an associations
block to
dump information on associated track
and cluster
objects. If
specified, the track
and cluster
configurations can contain:
- A
variables
block which is identical to the variables block for tracks and the flow objects. - An
edm_names
block which maps the specified name in the configuration (and in the output file) to the variable name in the xAOD. This is needed if you want to write e.g.track_pt
in addition to the particle flow constituentpt
. Specifying"edm_names": {"track_pt":"pt"}
will indicate that thept
accessor in the xAOD should be used, since using the same name twice will cause a clash in the output file.
The "dumper" key for trigger and multi-config#
Trigger dumps are using the CA way. In this case, we can stack in the CA
other algorithms that run before, after or alongside with the dumper.
Therefore, the dumper configuration needs to be encapsulated inside the "dumper"
group.
And we can add specific settings of other algorithms outside that, for example:
"external_setting_1": true,
"dumper": {
"jet_collection": "tempEmtopoJets",
...
}
trigger.py
under the getJobConfig
method.
Similarly, multiple dumps can run simultaneously. The configuration file, then, has to contain the different versions of the settings to be used for each instance of the dumper. For example:
"pflow": {
"dumper": {
"file": "EMPFlow.json"
}
},
"vrtrack": {
"dumper": {
"file": "TrackJets.json"
}
}
"dumper"
group only if the configuration file
being imported does not contain that group alredy, i.e. for trigger combos it should not be required.
Other configuration notes#
- Both the top level configuration and the track dumpers allow a
btagging_link
to be specified. This is the name of the link from jets to theBTagging
object, and has historically beenbtaggingLink
in most ATLAS code. This field can be omitted or left as an empty string:- In this case the tracks, the tracks will be read directly from the jet.
- In the top level, a missing link means there is no associated
b-tagging object. This means any attempts to write out
information on the
BTagging
object will fail!
Configuring CA blocks#
Full information about available blocks is found here.