Grid Dumps
Setup#
In addition to the setup of the training dataset dumper described in the installation instructions, you need to go to the directory where you checked out the package and run
source training-dataset-dumper/FTagDumper/grid/setup.sh
You will be prompted to enter your grid proxy passphrase.
After doing so you are set up to submit jobs on the grid using the grid-submit
script.
Running on the grid#
If you are unfamiliar with the grid-submit
script, be sure to use the -h
flag to get usage information.
Note that, as mentioned in the usage, the mode positional argument (e.g. single-btag
) must come after any optional flags.
The available optional flags and modes are listed when using -h
, and can also be found in the script's source code.
Start with a dry run
When submitting jobs, it's good practice to initially use the -d
flag to perform a dry run:
grid-submit -d single-btag
This will run the script without actually submitting any jobs to the grid, allowing you to double check the config and inputs before submission. The script will error if you have uncommitted changes to the repository in order to ensure reproducibility.
You can run the grid submission, tagging the current code state, using this command:
grid-submit -t <tagname> single-btag
You can overwrite the mode's default configuration file using -c <config.json>
.
Similarly, the executable can be specified with -s <script>
, and the inputs file with -i <inputs.txt>
.
The inputs file is a text file containing a list of input datasets to submit over, with one DSID per line.
Blank lines and lines starting with a #
are ignored.
The default datasets for each mode can be found here.
Anytime you dump samples it is good practice to tag the current state of the repository by using the -t <tagname>
flag. Note that:
- The date will be prepended to the tag name you provide.
- The tag will be pushed to your personal fork.
- The script will warn you if you attempt to push a tag to the main repository instead of a fork. [See here](contributing.md) for more information about working from a fork.
Changing default number of events#
In some situations, more events exist in a given input sample than are needed.
For grid submissions, this can mean a lot of wasted CPU time and inflated output dataset disk sizes.
In these cases, the argument -n [number of events]
can be used to automatically only run over a certain desired number of events.
This argument is applied per input sample, so if there are two samples, and the argument is -n 50000
then at least 50,000 events will be processed for each sample.
Under the hood, this works by iterating over the files in each input sample, adding up the file number of events in each sample until the threshold number of events has been reached or exceeded.
Note in the latter case this will result in slightly more events than requested being dumped.
Output dataset names#
The grid-submit
script will automatically create output dataset names (DSIDs) for your jobs. Given the DSID of an input DAOD sample, for example
mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYSVAL.e6337_s3681_r13167_p4931
the script will strip out the simulation description and keep the sample id (here 410470
) and the production tags (e6337_s3681_r13167_p4931
), and then append information about the dumping job.
First, tdd
is appended to easily identify the dataset as coming from the training dataset dumper, followed by the config file basename.
Next added are the release version or buildstamp, followed by the submission job ID.
When using -t
, the job ID is simply the tag name.
The output DSID might then be
user.x.410470.e6337_s3681_r13167_p4931.tdd.EMPFlow.22_2_68.22-04-23-T173636.22-04-20_tagname
-t
, but submitting with a clean working dir (all changes commmitted), git describe
is used to provide a unique reference to the latest commit from a previous tag.
If forcing submission with uncommitted changes using -f
, a timestamp is used as the job ID.
Using a local model file#
If you want to use a local lwtnn or ONNX model file to run inference on the grid you need to add the models to the tarball which is sent to the grid.
To do this, specify the network files in a comma separated list via the -e
option of the grid-submit
script.
On the grid, the models will be copied to the job working directory, so you should specify only the model filename for the nn_file_path
key in the dl2_configs
block.
In order to have the same job run locally (for example to run a test before submission), place the model files in the top level of the directory you are working in with the dumper (i.e. in the directory which also contains the build
and the training-dataset-dumper
folder).
Alternative option
Alternatively, you can also use the following approach.
In order to do that you need to first create a data
subfolder in the FTagDumper
directory
mkdir -p FTagDumper/data
In addition, you need to add the following lines to the cmake file FTagDumper/CMakeLists.txt
# add files
atlas_install_data(data/*)
cmake
).
Afterwards you copy all the network files (lwtnn, ONNX) into the directory FTagDumper/data
.
As nn_file_path
in the dl2_configs
you need to specify then the following path FTagDumper/<network_file>
which will be found by the PathResolver.
Job Bookkeeping#
Take a look at the documentation here for information on how to kill and retry grid jobs.
To set up pbook
, type the following commands
setupATLAS
lsetup panda
pbook
To show a list of your recent jobs, you can use
show()
To retry jobs in tasks which did not reach 100% completion you can use
retry([JediTaskID_1,JediTaskID_2])