Dataloading#
salt.data.SaltDataset
#
Bases: torch.utils.data.Dataset
An efficient map-style dataset for loading data from an H5 file containing structured arrays.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str
|
Input h5 filepath containing structured arrays |
required |
norm_dict |
str
|
Path to file containing normalisation parameters |
required |
variables |
salt.stypes.Vars
|
Input variables used in the forward pass for each input type |
required |
stage |
str
|
Stage of the training process |
required |
num |
int
|
Number of input samples to use. If |
-1
|
labels |
salt.stypes.Vars
|
List of required labels for each input type |
None
|
mf_config |
salt.utils.configs.MaskformerConfig
|
Config for Maskformer matching, by default None |
None
|
input_map |
dict
|
Map names to the corresponding dataset names in the input h5 file. If not provided, the input names will be used as the dataset names. |
None
|
num_inputs |
dict
|
Truncate the number of constituent inputs to this number, to speed up training |
None
|
non_finite_to_num |
bool
|
Convert nans and infs to zeros when loading inputs |
False
|
global_object |
str
|
Name of the global input object, as opposed to the constituent-level inputs |
'jets'
|
PARAMETERS |
dict | None
|
Variables used to parameterise the network, by default None. |
None
|
selections |
dict
|
Selections to apply to the input data, by default None. |
None
|
ignore_finite_checks |
bool
|
Ignoring check for non-finite inputs |
False
|
Source code in salt/data/datasets.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
salt.data.SaltDataModule
#
Bases: lightning.LightningDataModule
Datamodule wrapping a salt.data.SaltDataset
for training,
validation and testing.
This datamodule will load data from h5 files. The training, validation and test files
are specified by the train_file
, val_file
and test_file
arguments.
The arguments of this class can be set from the YAML config file or from the command line
using the data
key. For example, to set the batch_size
from the command line, use
--data.batch_size=1000
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_file |
str
|
Training file path |
required |
val_file |
str
|
Validation file path |
required |
batch_size |
int
|
Number of samples to process in each training step |
required |
num_workers |
int
|
Number of CPU worker processes to load batches from disk |
required |
num_train |
int
|
Total number of training samples |
required |
num_val |
int
|
Total number of validation samples |
required |
num_test |
int
|
Total number of testing samples |
required |
move_files_temp |
str
|
Directory to move training files to, default is None, which will result in no copying of files |
None
|
class_dict |
str
|
Path to umami preprocessing scale dict file |
None
|
test_file |
str
|
Test file path, default is None |
None
|
test_suff |
str
|
Test file suffix, default is None |
None
|
pin_memory |
bool
|
Pin memory for faster GPU transfer, default is True |
True
|
config_S3 |
dict | None
|
Some parameters for the S3 access |
None
|
**kwargs |
Keyword arguments for |
{}
|
Source code in salt/data/datamodules.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
Created: October 20, 2023