Trainers
Summary
Base classes used to describe your experiment.
Trainer is a class, which knows how to train a model!
SimpleTrainer is good for sklearn-like models.
EpochTrainer is perfect for neural networks and gradient boosting.
Metric values
Methods train and train_epoch always return metric values.
Metric values are represented by a dictionary with measured metric values, for instance:
metric_values = {
"train_accuracy": 0.92,
"valid_accuracy": 0.83,
"train_f1_score": 0.76,
"valid_f1_score": 0.66,
}
Each trainer has a field metric, which is a string.
metric declares, which metric value should be optimized by letstune.
The chosen metric is always greater-is-better.
Other metric values are stored only for an analysis after a tuning.
Serialization
All trainers should be pickleable before any call of load_dataset.
After a call of load_dataset, a trainer might be not pickleable.
Trainer reusability
Trainers before load_dataset might be cloned by:
new_trainer = pickle.loads(pickle.dumps(trainer))
load_dataset can be called at most once on a trainer.
Other methods, including train, load and create_model,
can be called many times.
Simple trainer class
- class letstune.SimpleTrainer
Base class for experiments without early-stopping.
Lifecycle
First,
load_dataset()is called with a dataset. It is expected to initialize fields related to the dataset.Then
train()is repeatedly called with various params.Bases:
typing.Generic[P],abc.ABCObligatory methods
Methods, which must be implemented in a trainer:
- abstract property metric: str
Goal of a tuning.
Methods
trainandtrain_epochmust return a dict containing this metric. Always interpreted as greater is better.
- abstract load_dataset(dataset: Any) None
Load a dataset.
datasetis a value passed-through by the functionletstune.tune(). Usually a path to a directory with training data.It is expected to save the loaded dataset to
self, like in this example:def load_dataset(self, dataset): data = pd.read_csv(dataset) self.x = data[["bill_length_mm", "bill_depth_mm"]] self.y = data["species"]
You can ignore the
datasetparameter.
- abstract train(params: P) tuple[Any, dict[str, float]]
Train model parametrized by
paramsand return it with metrics.Returns a tuple with
(fitted_model, metric_values).Metric values are described in the documentation of
letstune.trainermodule.
Optional methods
Methods, which can me overridden for further customization:
Epoch trainer class
- class letstune.EpochTrainer
Base class for experiments with early-stopping.
An epoch trainer contains a currently trained model as a field; usually in
self.model.Lifecycle
Training is epoch oriented.
First,
load_dataset()is called with a dataset. It is expected to initialize fields related to the dataset.Then, model training for given params is performed:
First,
create_model()orload()is called (but NOT both!).train_epoch()is repeatedly called.Finally,
save()is called.
The cycle, excluding
load_dataset(), might be repeated for different params.Bases:
typing.Generic[P],abc.ABCObligatory methods
Methods, which must be implemented in a trainer.
- abstract property metric: str
Goal of a tuning.
Methods
trainandtrain_epochmust return a dict containing this metric. Always interpreted as greater is better.
- abstract load_dataset(dataset: Any) None
Load a dataset.
datasetis a value passed-through by the functionletstune.tune(). Usually a path to a directory with training data.It is expected to save the loaded dataset to
self, like in this example:def load_dataset(self, dataset): data = pd.read_csv(dataset) self.x = data[["bill_length_mm", "bill_depth_mm"]] self.y = data["species"]
You can ignore the
datasetparameter.
Lifecycle methods
- create_model(params: P) None
Create a model parametrized by
params.Usually stores the model in
self.model.The default implementation uses
params.create_model().
- abstract train_epoch(epoch: int) dict[str, float]
Train the model for a next epoch.
Returns new metric values.
Metric values are described in the documentation of
letstune.trainermodule.epochis zero-indexed.
- save(checkpoint: Any) None
Save the model in the current state.
checkpointis an object passed from backend. Usually it has methodsave_pickle(model).The default implementation pickles
self.model.
- load(checkpoint: Any, params: P) None
Load the model from
checkpoint.checkpointis an object passed from backend. Usually it has methodload_pickle().Notice, that
create_model()might NOT be called beforeload().The default implementation unpickles to
self.model.
Investment rounds
When training with letstune.EpochTrainer,
letstune spends most of the time on the most promising
parameters.
letstune makes a kind of investment rounds.
At the first round, it evaluates all parameters for a few epochs.
Only 25% of trainings will advance to the next round. Trainings with the lowest metric value are automatically dropped.