Skip to content

Evaluation

You can evaluate models train using salt over a test set. Test samples are loaded from structured numpy arrays stored in h5 files, as for training. After producing the evaluation file, you can make performance plots using puma.

Running the Test Loop#

To evaluate a trained model on a test file, use the salt test command.

salt test --config logs/<timestamp>/config.yaml --data.test_file path/to/test.h5

As in the above example, you need to specify the saved config from the training run. By default, the checkpoint with the lowest validation loss is used for training. You can specify a different checkpoint with the --ckpt_path argument.

When evaluating a model from a resumed training, you need to explicitly specify --ckpt_path.

When you resume training, you specify a --ckpt_path and this is saved with the model config. If you then run salt test on the resulting config without specifying a new --ckpt_path, this same checkpoint will we be evaluated. To instead evaluate on the desired checkpoint from the resumed training job, you should explicitly specify --ckpt_path again to overwrite the one that is already saved in the config.

If you still want to choose the best epoch automatically, use --ckpt_path null.

You also need to specify a path to the test file using --data.test_file. This should be a prepared umami test file, and the framework should extract the sample name and append this to the checkpint file basename. The result is saved as an h5 file in the ckpts/ dir.

You can use --data.num_test to set the number of samples to test on if you want to override the default value from the training config.

Only one GPU is supported for the test loop.

When testing, only a single GPU is supported. This is enforced by the framework, so if you try to use more than one device you will see a message Setting --trainer.devices=1

Output files are overwritten by default.

You can use --data.test_suff to append an additional suffix to the evaluation output file name.

Extra Evaluation Variables#

When evaluating a model, the jet and track variables included in the output file can be configured. The variables can be configured as follows within the PredictionWriter callback configuration in the base configuration file.

callbacks:
    - class_path: salt.callbacks.Checkpoint
      init_args:
        monitor_loss: val/jet_classification_loss
    - class_path: salt.callbacks.PredictionWriter
      init_args:
        write_tracks: False
        extra_vars:
          jets:
            - pt_btagJes
            - eta_btagJes
            - HadronConeExclTruthLabelID
            - n_tracks
            - n_truth_promptLepton
            tracks:
            - truthOriginLabel
            - truthVertexIndex

By default, only the jet quantities are evaluated to save time and space. If you want to study the track aux task performance, you need to specify write_tracks: True in the PredictionWriter callback configuration.

The full API for the PredictionWriter callback is found below.

salt.callbacks.PredictionWriter #

Bases: lightning.Callback

Write test outputs to h5 file.

This callback will write the outputs of the model to an h5 evaluation file. The outputs are produced by calling the run_inference method of each task. The output file is written to the same directory as the checkpoint file, and has the same name as the checkpoint file, but with the suffix __test_<sample><suffix>.h5. The file will contain one dataset for each input type, with the same name as the input type in the test file.

Parameters:

Name Type Description Default
write_tracks bool

If False, skip any tasks with "tracks" in input_name.

False
write_objects bool

If False, skip any tasks with input_name="objects" and outputs of the MaskDecoder. Default is False

False
half_precision bool

If true, write outputs at half precision

False
object_classes list

List of flavour names with the index corresponding to the label values. This is used to construct the global object classification probability output names.

None
extra_vars salt.stypes.Vars

Extra variables to write to file for each input type. If not specified for a given input type, all variables in the test file will be written.

None
Source code in salt/callbacks/predictionwriter.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def __init__(
    self,
    write_tracks: bool = False,
    write_objects: bool = False,
    half_precision: bool = False,
    object_classes: list | None = None,
    extra_vars: Vars | None = None,
) -> None:
    """Write test outputs to h5 file.

    This callback will write the outputs of the model to an h5 evaluation file. The outputs
    are produced by calling the `run_inference` method of each task. The output file
    is written to the same directory as the checkpoint file, and has the same name
    as the checkpoint file, but with the suffix `__test_<sample><suffix>.h5`. The file will
    contain one dataset for each input type, with the same name as the input type in the test
    file.

    Parameters
    ----------
    write_tracks : bool
        If False, skip any tasks with `"tracks" in input_name`.
    write_objects : bool
        If False, skip any tasks with `input_name="objects"` and outputs of the
        MaskDecoder. Default is False
    half_precision : bool
        If true, write outputs at half precision
    object_classes : list
        List of flavour names with the index corresponding to the label values. This is used
        to construct the global object classification probability output names.
    extra_vars : Vars
        Extra variables to write to file for each input type. If not specified for a given input
        type, all variables in the test file will be written.
    """
    super().__init__()
    if extra_vars is None:
        extra_vars = defaultdict(list)
    self.extra_vars = extra_vars
    self.write_tracks = write_tracks
    self.write_objects = write_objects
    self.half_precision = half_precision
    self.precision = "f2" if self.half_precision else "f4"
    self.object_classes = object_classes

Integrated Gradients#

Integrated gradients is a method for attributing contributions from each input feature to model outputs. Further details can be found here. A callback can be added to the config after training has been completed, before evaluation is run. An example can be found below:

callbacks:
  - class_path: salt.callbacks.IntegratedGradientWriter
    init_args:
      add_softmax: true
      n_baselines: 5
      min_allowed_track_sizes: 5
      max_allowed_track_sizes: 25
      n_steps: 50
      n_jets: 100_000
      internal_batch_size: 10_000
      input_keys:
        inputs: 
          - jets
          - tracks
        pad_masks:
          - tracks
      output_keys: [jets, jets_classification]
      overwrite: true

decriptions of the parameters can be found below:

salt.callbacks.IntegratedGradientWriter #

Bases: lightning.Callback

Callback to run Integrated Gradients on the test set and save the results to a file.

Parameters:

Name Type Description Default
input_keys dict

Dictionary of input keys to be used for the model. This should take the form: input_keys: inputs: ["jets", "tracks", ... [any other inputs]] pad_masks: ["tracks", ... [any other pad masks]]

required
output_keys list

A list of keys representing the nested output of the model we wish to use. E.g., if the model returns {'jets' : {'jets_classification' : [predictions ]}} then 'output_keys' should be : ['jets', 'jets_classification']

required
add_softmax bool

Whether to add softmax to the model outputs. Default is True.

True
n_baselines int

Number of baselines to use for each jet. Default is 5.

5
min_allowed_track_sizes int

Only calculate attributions for jets with at least this many tracks. Default is 5.

5
max_allowed_track_sizes int

Only calculate attributions for jets with at most this many tracks. Default is 15.

15
min_allowed_flow_sizes int | None

Only calculate attributions for jets with at least this many tracks. Default is None, meaning no minimum flow size is applied.

None
max_allowed_flow_sizes int | None

Only calculate attributions for jets with at most this many tracks. Default is None, meaning no maximum flow size is applied

None
tracks_name str

Name of the tracks in the output file. Default is "tracks".

'tracks'
flows_name str

Name of the flows in the output file. Default is "flows".

'flows'
n_jets int

Number of jets to use for the attribution calculation. Default is -1, which means half of jets in the test set are used.

-1
n_steps int

Number of steps to use for the estimation of the integrated gradients integral. Default is 50.

50
internal_batch_size int

Batch size that Captum uses when calculating integrated gradients. Default is -1, which means the same batch size as the dataloader is used.

-1
normalize_deltas bool

Whether to normalize the convergence deltas. Default is True.

True
overwrite bool

Whether to overwrite the output file if it already exists. Default is False.

False
Source code in salt/callbacks/integrated_gradients_writer.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def __init__(
    self,
    input_keys: dict,
    output_keys: list,
    add_softmax: bool = True,
    n_baselines: int = 5,
    min_allowed_track_sizes: int = 5,
    max_allowed_track_sizes: int = 15,
    min_allowed_flow_sizes: int | None = None,
    max_allowed_flow_sizes: int | None = None,
    tracks_name: str = "tracks",
    flows_name: str = "flows",
    n_jets: int = -1,
    n_steps: int = 50,
    internal_batch_size: int = -1,
    normalize_deltas: bool = True,
    overwrite: bool = False,
) -> None:
    """Callback to run Integrated Gradients on the test set and save the results to a file.

    Parameters
    ----------
    input_keys : dict
        Dictionary of input keys to be used for the model. This should take the form:
        input_keys:
            inputs: ["jets", "tracks", ... [any other inputs]]
        pad_masks: ["tracks", ... [any other pad masks]]
    output_keys : list
        A list of keys representing the nested output of the model we wish to use. E.g., if
        the model returns {'jets' : {'jets_classification' : [predictions ]}} then
        'output_keys' should be : ['jets', 'jets_classification']
    add_softmax : bool
        Whether to add softmax to the model outputs. Default is True.
    n_baselines : int
        Number of baselines to use for each jet. Default is 5.
    min_allowed_track_sizes : int
        Only calculate attributions for jets with at least this many tracks. Default is 5.
    max_allowed_track_sizes : int
        Only calculate attributions for jets with at most this many tracks. Default is 15.
    min_allowed_flow_sizes : int | None
        Only calculate attributions for jets with at least this many tracks. Default is None,
        meaning no minimum flow size is applied.
    max_allowed_flow_sizes : int | None
        Only calculate attributions for jets with at most this many tracks. Default is None,
        meaning no maximum flow size is applied
    tracks_name : str
        Name of the tracks in the output file. Default is "tracks".
    flows_name : str
        Name of the flows in the output file. Default is "flows".
    n_jets : int
        Number of jets to use for the attribution calculation. Default is -1, which means
        half of jets in the test set are used.
    n_steps : int
        Number of steps to use for the estimation of the integrated gradients integral.
        Default is 50.
    internal_batch_size : int
        Batch size that Captum uses when calculating integrated gradients. Default is -1, which
        means the same batch size as the dataloader is used.
    normalize_deltas : bool
        Whether to normalize the convergence deltas. Default is True.
    overwrite : bool
        Whether to overwrite the output file if it already exists. Default is False.
    """
    super().__init__()
    self.verbose = True
    if not HAS_IG:
        raise ImportError(
            FAILED_IG.msg
            + "\n"
            + "IntegratedGradientWriter requires captum and salt-attribution. "
            "Please install them with `pip install -r requirements-ig.txt`."
        )

    self.add_softmax = add_softmax
    self.n_baselines = n_baselines
    self.min_allowed_track_sizes = min_allowed_track_sizes
    self.max_allowed_track_sizes = max_allowed_track_sizes
    self.allowed_track_sizes = torch.arange(
        min_allowed_track_sizes, max_allowed_track_sizes + 1
    )

    if min_allowed_flow_sizes is not None and max_allowed_flow_sizes is not None:
        self.allowed_flow_sizes = torch.arange(
            min_allowed_flow_sizes, max_allowed_flow_sizes + 1
        )
        self.do_flows = True
    elif min_allowed_flow_sizes is None and max_allowed_flow_sizes is None:
        self.allowed_flow_sizes = None
        self.do_flows = False
    else:
        raise ValueError("Either both min/max flow count must be set, or both must be None.")

    self.input_keys = input_keys
    self.output_keys = output_keys
    self.tracks_name = tracks_name
    self.flows_name = flows_name
    self.n_steps = n_steps
    self.n_jets = n_jets
    self.internal_batch_size = internal_batch_size
    self.normalize_deltas = normalize_deltas

    self.overwrite = overwrite
    self.input_keys = input_keys

Last update: May 22, 2025
Created: October 24, 2022