Skip to content

Grid Dumps

Setup#

In addition to the setup of the training dataset dumper described in the installation instructions, you need to go to the directory where you checked out the package and run

source training-dataset-dumper/FTagDumper/grid/setup.sh

You will be prompted to enter your grid proxy passphrase. After doing so you are set up to submit jobs on the grid using the grid-submit script.

Running on the grid#

If you are unfamiliar with the grid-submit script, be sure to use the -h flag to get usage information. Note that, as mentioned in the usage, the mode positional argument (e.g. single-btag) must come after any optional flags. The available optional flags and modes are listed when using -h, and can also be found in the script's source code.

Start with a dry run

When submitting jobs, it's good practice to initially use the -d flag to perform a dry run:

grid-submit -d single-btag

This will run the script without actually submitting any jobs to the grid, allowing you to double check the config and inputs before submission. The script will error if you have uncommitted changes to the repository in order to ensure reproducibility.

You can run the grid submission, tagging the current code state, using this command:

grid-submit -t <tagname> single-btag

You can overwrite the mode's default configuration file using -c <config.json>. Similarly, the executable can be specified with -s <script>, and the inputs file with -i <inputs.txt>. The inputs file is a text file containing a list of input datasets to submit over, with one DSID per line. Blank lines and lines starting with a # are ignored. The default datasets for each mode can be found here.

Anytime you dump samples it is good practice to tag the current state of the repository by using the -t <tagname> flag. Note that:

Additional info on tags
  • The date will be prepended to the tag name you provide.
  • The tag will be pushed to your personal fork.
  • The script will error if you attempt to push a tag to the main repository instead of a fork. See here for more information about working from a fork.

Changing default number of events#

In some situations, more events exist in a given input sample than are needed. For grid submissions, this can mean a lot of wasted CPU time and inflated output dataset disk sizes. In these cases, the argument -n [number of events] can be used to automatically only run over a certain desired number of events. This argument is applied per input sample, so if there are two samples, and the argument is -n 50000 then at least 50,000 events will be processed for each sample. Under the hood, this works by iterating over the files in each input sample, adding up the file number of events in each sample until the threshold number of events has been reached or exceeded. Note in the latter case this will result in slightly more events than requested being dumped.

Output dataset names#

The grid-submit script will automatically create output dataset names (DSIDs) for your jobs. Given the DSID of an input DAOD sample, for example

mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYSVAL.e6337_s3681_r13167_p4931

the script will strip out the simulation description and keep the sample id (here 410470) and the production tags (e6337_s3681_r13167_p4931), and then append information about the dumping job. First, tdd is appended to easily identify the dataset as coming from the training dataset dumper, followed by the config file basename. Next added are the release version or buildstamp, followed by the submission job ID. When using -t, the job ID is simply the tag name. The output DSID might then be

user.x.410470.e6337_s3681_r13167_p4931.tdd.EMPFlow.22_2_68.22-04-23-T173636.22-04-20_tagname
If not using -t, but submitting with a clean working dir (all changes commmitted), git describe is used to provide a unique reference to the latest commit from a previous tag. If forcing submission with uncommitted changes using -f, a timestamp is used as the job ID.

Using a local model file#

If you want to use a local lwtnn or ONNX model file to run inference on the grid you need to add the models to the tarball which is sent to the grid. To do this, specify the network files in a comma separated list via the -e option of the grid-submit script.

On the grid, the models will be copied to the job working directory, so you should specify only the model filename for the nn_file_path key in the dl2_configs block.

In order to have the same job run locally (for example to run a test before submission), place the model files in the top level of the directory you are working in with the dumper (i.e. in the directory which also contains the build and the training-dataset-dumper folder).

Alternative option

Alternatively, you can also use the following approach. In order to do that you need to first create a data subfolder in the FTagDumper directory

mkdir -p FTagDumper/data

In addition, you need to add the following lines to the cmake file FTagDumper/CMakeLists.txt

# add files
atlas_install_data(data/*)
Before running the dumper, you need to recompile (inclduing cmake).

Afterwards you copy all the network files (lwtnn, ONNX) into the directory FTagDumper/data. As nn_file_path in the dl2_configs you need to specify then the following path FTagDumper/<network_file> which will be found by the PathResolver.

Job Bookkeeping#

Take a look at the documentation here for information on how to kill and retry grid jobs.

To set up pbook, type the following commands

setupATLAS
lsetup panda
pbook

To show a list of your recent jobs, you can use

show()

To retry jobs in tasks which did not reach 100% completion you can use

retry([JediTaskID_1,JediTaskID_2])