Evaluation
You can evaluate models train using salt over a test set. Test samples are loaded from structured numpy arrays stored in h5 files, as for training. After producing the evaluation file, you can make performance plots using puma.
Running the Test Loop#
To evaluate a trained model on a test file, use the salt test
command.
salt test --config logs/<timestamp>/config.yaml --data.test_file path/to/test.h5
As in the above example, you need to specify the saved config from the training run.
By default, the checkpoint with the lowest validation loss is used for training.
You can specify a different checkpoint with the --ckpt_path
argument.
When evaluating a model from a resumed training, you need to explicitly specify --ckpt_path
.
When you resume training, you specify a --ckpt_path
and this is saved with the model config.
If you then run salt test
on the resulting config without specifying a new --ckpt_path
, this same checkpoint will
we be evaluated. To instead evaluate on the desired checkpoint from the resumed training job, you should explicitly
specify --ckpt_path
again to overwrite the one that is already saved in the config.
If you still want to choose the best epoch automatically, use --ckpt_path null
.
You also need to specify a path to the test file using --data.test_file
.
This should be a prepared umami test file, and the framework should extract
the sample name and append this to the checkpint file basename.
The result is saved as an h5 file in the ckpts/
dir.
You can use --data.num_test
to set the number of samples to test on if you want to
override the default value from the training config.
Only one GPU is supported for the test loop.
When testing, only a single GPU is supported.
This is enforced by the framework, so if you try to use more than one device you will see a message
Setting --trainer.devices=1
Output files are overwritten by default.
You can use --data.test_suff
to append an additional suffix to the evaluation output file name.
Extra Evaluation Variables#
When evaluating a model, the jet and track variables included in the output file can be configured.
The variables can be configured as follows within the PredictionWriter
callback configuration in the base configuration file.
callbacks:
- class_path: salt.callbacks.Checkpoint
init_args:
monitor_loss: val/jet_classification_loss
- class_path: salt.callbacks.PredictionWriter
init_args:
write_tracks: False
extra_vars:
jets:
- pt_btagJes
- eta_btagJes
- HadronConeExclTruthLabelID
- n_tracks
- n_truth_promptLepton
tracks:
- truthOriginLabel
- truthVertexIndex
By default, only the jet quantities are evaluated to save time and space.
If you want to study the track aux task performance, you need to specify write_tracks: True
in the PredictionWriter
callback configuration.
The full API for the PredictionWriter
callback is found below.
salt.callbacks.PredictionWriter
#
Bases: lightning.Callback
Write test outputs to h5 file.
This callback will write the outputs of the model to an h5 evaluation file. The outputs
are produced by calling the run_inference
method of each task. The output file
is written to the same directory as the checkpoint file, and has the same name
as the checkpoint file, but with the suffix __test_<sample><suffix>.h5
. The file will
contain one dataset for each input type, with the same name as the input type in the test
file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
write_tracks
|
bool
|
If False, skip any tasks with |
False
|
write_objects
|
bool
|
If False, skip any tasks with |
False
|
half_precision
|
bool
|
If true, write outputs at half precision |
False
|
object_classes
|
list
|
List of flavour names with the index corresponding to the label values. This is used to construct the global object classification probability output names. |
None
|
extra_vars
|
salt.stypes.Vars
|
Extra variables to write to file for each input type. If not specified for a given input type, all variables in the test file will be written. |
None
|
Source code in salt/callbacks/predictionwriter.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
Integrated Gradients#
Integrated gradients is a method for attributing contributions from each input feature to model outputs. Further details can be found here. A callback can be added to the config after training has been completed, before evaluation is run. An example can be found below:
callbacks:
- class_path: salt.callbacks.IntegratedGradientWriter
init_args:
add_softmax: true
n_baselines: 5
min_allowed_track_sizes: 5
max_allowed_track_sizes: 25
n_steps: 50
n_jets: 100_000
internal_batch_size: 10_000
input_keys:
inputs:
- jets
- tracks
pad_masks:
- tracks
output_keys: [jets, jets_classification]
overwrite: true
decriptions of the parameters can be found below:
salt.callbacks.IntegratedGradientWriter
#
Bases: lightning.Callback
Callback to run Integrated Gradients on the test set and save the results to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_keys
|
dict
|
Dictionary of input keys to be used for the model. This should take the form: input_keys: inputs: ["jets", "tracks", ... [any other inputs]] pad_masks: ["tracks", ... [any other pad masks]] |
required |
output_keys
|
list
|
A list of keys representing the nested output of the model we wish to use. E.g., if the model returns {'jets' : {'jets_classification' : [predictions ]}} then 'output_keys' should be : ['jets', 'jets_classification'] |
required |
add_softmax
|
bool
|
Whether to add softmax to the model outputs. Default is True. |
True
|
n_baselines
|
int
|
Number of baselines to use for each jet. Default is 5. |
5
|
min_allowed_track_sizes
|
int
|
Only calculate attributions for jets with at least this many tracks. Default is 5. |
5
|
max_allowed_track_sizes
|
int
|
Only calculate attributions for jets with at most this many tracks. Default is 15. |
15
|
min_allowed_flow_sizes
|
int | None
|
Only calculate attributions for jets with at least this many tracks. Default is None, meaning no minimum flow size is applied. |
None
|
max_allowed_flow_sizes
|
int | None
|
Only calculate attributions for jets with at most this many tracks. Default is None, meaning no maximum flow size is applied |
None
|
tracks_name
|
str
|
Name of the tracks in the output file. Default is "tracks". |
'tracks'
|
flows_name
|
str
|
Name of the flows in the output file. Default is "flows". |
'flows'
|
n_jets
|
int
|
Number of jets to use for the attribution calculation. Default is -1, which means half of jets in the test set are used. |
-1
|
n_steps
|
int
|
Number of steps to use for the estimation of the integrated gradients integral. Default is 50. |
50
|
internal_batch_size
|
int
|
Batch size that Captum uses when calculating integrated gradients. Default is -1, which means the same batch size as the dataloader is used. |
-1
|
normalize_deltas
|
bool
|
Whether to normalize the convergence deltas. Default is True. |
True
|
overwrite
|
bool
|
Whether to overwrite the output file if it already exists. Default is False. |
False
|
Source code in salt/callbacks/integrated_gradients_writer.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
Created: October 24, 2022