Advanced Configuration#
Dataloading#
Selecting Training Variables#
Training files are structured arrays, so it is easy to specify which variables you want to include in the training by name.
In your data
config there is a variables
key which specifies which variables to include in the training for each input type.
These are defined in the model files, rather than the base.yaml
config.
Make sure you do not train on truth information!
The variables listed under data.variables
are the inputs to the training.
You should not include any truth information (unless you are testing this explicitly),
but rather specify truth labels for each task in your model config.
For example, in GN1.yaml
you will find the following variables:
data:
variables:
jets:
- pt_btagJes
- eta_btagJes
tracks:
- d0
- z0SinTheta
- dphi
- deta
...
The number of variables specified here will be used to automatically set the input_size
of your salt.models.InitNet
modules.
Training with multiple types of inputs beyond jets and tracks is supported to create a heterogeneous model. An example of this can be found in GN2emu.yaml
which includes a separate electrons input type.
Mapping Input Dataset Names#
By default, the input names are used to directly retrieve dataset in the input h5 files.
If you want to use a different name to retrieve the h5 datasets, you can specify a mapping in the data
config, using the input_map
key.
data:
input_map:
internal_name: h5_dataset_name
In this example, h5_dataset_name
will be used to retrieve the datasets from the input h5 files, while internally (and elsewhere in the configuration) this input type will be referred to as internal_name
.
Truncating Inputs#
You can truncate the number of tracks used for training.
This can be useful to speed up training.
To do so, pass --data.num_inputs.tracks=10
to limit the training to the first ten tracks.
This can also be configured in yaml via
data:
num_inputs:
tracks: 10
Remapping Labels#
This section is about remapping labels on the fly, which is useful in case they are not already mapped to 0, 1, 2...
.
In the config block for each task, you can specify a label_map
which maps the label values as stored in the input file to the ones you want to use in the task loss calculation.
For example, instead of using the pre-mapped flavour_label
, you could directly train on HadronConeExclTruthLabelID
using,
class_path: salt.models.ClassificationTask
init_args:
input_name: jets
name: jet_classification
label: HadronConeExclTruthLabelID
label_map: { 0: 0, 4: 1, 5: 2 }
class_names: [ujets, cjets, bjets]
...
Note, when using label_map
you also need to provide class_names
.
When using flavour_label
as the target, the class names are automatically determined for you from the training file.
Data From S3#
To use S3 as an ATLAS user, some upstream setting up must be done with the CERN OpenStack project. In particular, you must have access to a bucket and initialised your own public and secret keys. Once you have this information, you can access your bucket from anywhere using your credentials. To set up the credentials for salt, please incluse the following configuration in the base_config.yaml
or your model config, under the data
option:
config_S3:
use_S3: False # Set to true to setup S3 (needed for storing results)
download_S3: False # Set to true to download files in download_files from S3
pubKey: # public key
secKey: # private key
url: https://s3.cern.ch # url, for OpenStack at cern used this.
bucket: # bucket name
download_path: # local path to download the files to
download_files: # files key to download, matching an entry in the config.data
- train_file
- val_file
- norm_dict
- class_dict
download_S3
key set to True and the files key (matching entries in the config data
part of the yaml) being download locally to the download_path
. Note that you can run a salt training directly on data located on S3 and downloading it locally: the download S3 scripts will update the paths to point locally automatically. You can also choose to first download the script with the salt-installed download_S3
as such:
download_S3 --config configs/GN2.yaml
This will run the downloading script without starting the salt CLI.
Importantly, if your aim is to use S3 to store training data (configs, checkpoints of model, performance, ...), you must modify some entries in the callbacks in the base config.
trainer:
...
default_root_dir: s3://BUCKET/FOLDER
...
logger:
class_path: lightning.pytorch.loggers.TensorBoardLogger
default_root_dir
should be a valid url to an S3 folder under your bucket. The default CometLogger will not work with S3 and you must instead use the TensorBoardLogger (please take care to not keep the instantiate arguments of commet by commenting init_args: { project_name: salt, display_summary_level: 0 }
).
Model Architecture#
Global Object Features#
By default, inputs from the global object are concatenated with each of the input constituents at the beginning of the model
in the salt.models.InitNet
.
You can instead choose to concatenate global inputs with the pooled representation after the encoder step.
In order to this you should add a GLOBAL
key under data.variables
and specify which global-level variables do you want to use.
Don't forget to change pooling and task input size accordingly
If you concatenate 2 global variables you should increase the input_size
by 2 for pool_net
and all tasks (except vertexing, here you should increase by 4).
For example you can concatenate jet features after the gnn model with:
variables
section
data:
variables:
GOBAL:
- pt_btagJes
- eta_btagJes
...
You can find a complete example of adding jet-level SMT variables in the GN2emu.yaml
config.
Switching Attention Mechanisms#
By default the GN1.yaml
config uses the scaled dot product attention found in transformers.
To switch to the GATv2 attention, add the GATv2.yaml
config fragment.
salt fit --config configs/GN1.yaml --config configs/GATv2.yaml
Note that the num_heads
and head_dim
arguments must match those in the gnn.init_args
config block.
Switching Pooling Mechanisms#
The GN2.yaml
config by default uses Global Attention Pooling. This can be switched out for Cross Attention Pooling where a learnable class token is attended to by the learned track representation. The same arguments used for the Transformer Encoder apply to the Cross Attention layers. An example Cross Attention block is shown below:
pool_net:
class_path: salt.models.CrossAttentionPooling
init_args:
input_size: 128
num_layers: 4
mha_config:
num_heads: 4
attention:
class_path: salt.models.ScaledDotProductAttention
out_proj: False
dense_config:
activation: *activation
hidden_layers: [128]
dropout: 0.1
Edge Features#
It is possible to include edge features as network input, representing relational information between constituent tracks. These have to be implemented on an individual basis since they are not stored in the input files, but rather calculated on the fly within Salt. As such, the information is drawn from the track container (or equivalent) and requires the presence of track variables relevant to the calculation of each edge feature. Currently implemented features are:
dR
= log(\sqrt{d\eta^2 + d\phi^2}) (requiresphi
,eta
)kt
= log(min(p_T)\sqrt{d\eta^2 + d\phi^2}) (requirespt
)z
= log(\frac{min(p_T)}{\Sigma(p_T)}) (requirespt
,phi
,eta
)isSelfLoop
= 1 if edge represents self-connection, 0 if notsubjetIndex
= 1 if tracks are part of same subjet, 0 if not (requiressubjetIndex
)
Heterogeneous Models#
If multiple input types are provided, separate initialiser networks should be provided for each input type. An example using both track and electron input types is provided below:
init_nets:
- input_name: tracks
dense_config: &init
output_size: &embed_dim 192
hidden_layers: [256]
activation: &activation SiLU
- input_name: electrons
dense_config:
<<: *init
The separate input types are by default combined and treated homogeneously within the GNN layers. Alternatively, each input type can be updated with separate self-attention blocks and cross-attention blocks between each input type:
This isn't recommended
It is slower and hasn't proved to be more performant than the homogeneous GNN approach
encoder:
class_path: salt.models.TransformerCrossAttentionEncoder
init_args:
input_names: [tracks, electrons]
ca_every_layer: true
embed_dim: 192
num_layers: 6
out_dim: &out_dim 128
mha_config:
num_heads: 4
attention:
class_path: salt.models.ScaledDotProductAttention
out_proj: False
sa_dense_config:
activation: *activation
hidden_layers: [256]
dropout: &dropout 0.1
Parameterisation#
It is possible to condition the network output on particular variables (such as an exotic particles mass) to create a so-called parameterised neural network. Parameters are treated as additional input variables during training and the user can chose which values to use when evaluating the model.
A parameterised network can be configured in the following way:
variables
section: Add your parameters to lists of variables used in training.data: variables: ... PARAMETERS: - mass ...
PARAMETERS
section: In a dedicated section, for each parameter, specify the values of the parameter that appear in your training set (as a list intrain
) and the value you wish to evaluate the model at (test
). Optionally you can include a list of probabilties for each parameter corresponding to the probabilities of assigning background jets one of the parameter values given intrain
. These probabilties should reflect how each parameter value is represented within the training data set. If probabilities are not given, values will be assigned to background jets with equal probability. Ensure parameters appear in the same order inPARAMETERS
as invariables
.PARAMETERS: mass: train: [5, 16, 55] test: 40 prob: [0.2, 0.3, 0.5]
The implementation above produces the default parameterisation mechanism in which parameters are concatenated to the inputs of the model, as described by Baldi et al. Alternatively, one may instead apply feature wise transformations, as described
by Dumoulin et al. Here, parameters are passed as inputs to seperate networks whose outputs can be used to scale or bias the features of a layer. To use feature transformations you must add featurewise_nets
to your models configuration as follows:
model:
class_path: salt.models.SaltModel
init_args:
...
featurewise_nets:
- layer: input
dense_config_scale:
hidden_layers: [4]
output_size: 17
dense_config_bias:
hidden_layers: [4]
output_size: 17
- layer: encoder
apply_norm: True
dense_config_scale:
hidden_layers: [128]
output_size: 256
dense_config_bias:
hidden_layers: [128]
output_size: 256
- layer: global
dense_config_scale:
output_size: 128
hidden_layers: [64]
final_activation: Sigmoid
...
Here, two instances of featurewise transformations have been added to the model. For each, you must specify the layer whose features you would
like to transform (this can currently be either input
, which applies the transformations to the features before they are passed into the initialisation network, encoder
, which applies the transformations to the inputs of each layer to the encoder using separate networks, or global
, which applies them to the global track representations outputted by the encoder). For each instance, you can specify either one or both of dense_config_scale
or dense_config_bias
, which configure dense networks whose output scales and biases the features of the chosen layer, respectively. It is important to ensure the output_size
of these networks matches the number of features in the layer you are transforming. In this case, the transformations are applied to a model with 17 inputs per track, the layers of an encoder with 256 features, and the output of the encoder, which has 128 features for each track representation. You can optionally apply a layer normalisation after applying the transformations by setting apply_norm: True
for a given network, as shown above.
Training#
Compiled Models#
Pytorch 2.0 introduced compiled models via torch.compile()
which improves execution times.
In tests, compilation can increase the execution speed of the model by up to 1.5x.
You can enable compilation by passing the --compile
flag to the CLI.
You may see some warnings printed at the start of training, and the first step will take a while as the model is JIT compiled.
If you see g++
compile errors, you may need to update your compiler
You can check your g++
/gcc
version with g++ --version
.
To use torch.compile()
, you'll need gcc
version 10 or later.
You can install a more recent version with
conda install -c conda-forge cxx-compiler
torch.compile()
results
The following results were obtained on a single A100 GPU with a batch size of 5,000 and 40 workers.
Model | Eager | torch.compile() |
Speedup |
---|---|---|---|
GN3 No Aux | 8.9 it/s | 15.4 it/s | 1.73x |
GN3 | 6.4 it/s | 9.0 it/s | 1.41x |
Memory usage should be unaffected by compiling the model. Please report any issues you may have
torch.compile()
has not been tested with mutli GPU training
Hyperparameter Optimisation#
Katib#
In order to train salt on Katib, the performance must be printed to the output stream. The PerformanceWriter
callback is available for that very purpose. It also stores the printed metrics in a json file stored at a writable local path dir_path
(by default trainer.log_dir
). For katib, it is important to set the stdout value to True and pointing the Katib metric collector to stdOut.
An example configuration to be added to the base.yaml
config file is:
callbacks:
- class_path: salt.callbacks.PerformanceWriter
init_args:
dir_path: /mylocal/path #any local path that is writable
add_metrics: # a list of string of potential additional metrics - included by default: train_loss, val_loss, val_accuracy_loss
- a_fancy_new_metric
- another_fancy_new_metric
stdOut: True # whether to print to stdOut
muTransfer#
Salt is compatible with the muTransfer technique outline in the paper Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer.
Setup#
To setup muP, the model configuration (e.g., GN2.yaml
) has to include the following extra-configuration setup to be placed under the config.model
(e.g., after model.lrs_config
and before model.model
):
muP_config:
shape_path: my_path_to_a_folder_for_shape
embed_dim:
apply_to: [init_nets, encoder]
parameter_name: [output_size, embed_dim]
parameter_base: 128
parameter_delta: 4
Such that the base
(delta
) models are instantiated with the parameters highlighted in parameter_name
, respectively corresponding to the module apply_to
, taking the value parameter_base
(parameter_delta
). The storeshapes
file will be placed at the path shape_path
or, if this parameter is not set, at ./temp_muP/
with the base
and delta
models as well as their configuration (useful to debug they were correctly setup).
To run a GN2 training with muP, you also need to specify in encoder
(and the init_nets
if it is affected) config that it should be in muP
configuration with the following boolean parameters:
- for
init_nets
(only if changing embedding dim):
init_nets:
- input_name: tracks
dense_config:
...
muP: True
- for
encoder
:
encoder:
class_path: salt.models.TransformerEncoder
init_args:
...
muP: True
Run#
To run muP, you must instantiate a GN2 model into the Maximal Update Parametrisation (muP). To do this, you must follow the following steps, which are further detailed next.
- step 1: create
storeshapes
file using a model config file with muP configuration:
setup_muP -config GN2.yaml
- step 2: run a muP training normally with the model config with muP configuration:
salt fit --config GN2.yaml
The config file GN2_muP.yaml
gives an example of a valid configuration file for muP.
A gentle introduction to muP is available in this talk.
Important note: muP has been implemented to scale the transformer encoder (and init_nets if the embedding is changed). The last layer in the scaling must be the out-projecting of the encoder (controlled with out_dim
), which in particular must be set!
Step 1:
To leverage the existing muP library, a base
and delta
models have to be instantiated using the main_muP
script to generate a storeshapes
file to be passed to the muP library. Note that you must vary a parameter between the base
and delta
models, as this will define the dimension to muTransfer along (embedding dimension and num_heads are supported). This script is installed with salt and callable under the name setup_muP
. For example, run:
setup_muP -c GN2.yaml
Where the GN2.yaml
is your usual model configuration file, endowed with the following extra-configuration setup to be placed under the config.model
(e.g., after model.lrs_config
and before model.model
):
muP_config:
shape_path: my_path_to_a_folder_for_shape
embed_dim:
apply_to: [init_nets, encoder]
parameter_name: [output_size, embed_dim]
parameter_base: 128
parameter_delta: 4
The setup_muP
script will instantiate a base
(delta
) model with the parameters highlighted in parameter_name
, respectively corresponding to the module apply_to
, taking the value parameter_base
(parameter_delta
). The storeshapes
file will be placed at the path shape_path
or, if this parameter is not set, at ./temp_muP/
with the base
and delta
models as well as their configuration (useful to debug they were correctly setup). Note: currently supporting the num_heads & embedding size of the transformer encoder
, with the latter being also relevant to init_nets
. Both the base and delta value have to be divided by your chosen num_heads
!
Step 2:
With step 1 creating a storeshapes
under the path shape_path
or the default ./temp_muP
, you can now turn to training a GN2 models with your desired widths. The model will have to load the storeshapes
in the initialiser of ModelWrapper
, and you must make sure the model has the muP_config passed to it with, in particular, the right path to the storeshapes
(easiest is to not change the config w.r.t. base and delta model initialisation).
To run a GN2 training with muP, you also need to specify in encoder
(and the init_nets
if it is affected) config that it should be in muP
configuration with the following boolean parameters:
- for init_nets
(only if changing embedding dim):
init_nets:
- input_name: tracks
dense_config:
...
muP: True
encoder
:
encoder:
class_path: salt.models.TransformerEncoder
init_args:
...
muP: True
If correctly setup, you can just run a salt training in the usual way:
salt fit --config GN2.yaml
You are now training a muP-GN2!
Created: May 31, 2023