PyTorch Ignite
Trains is now ClearML
This documentation applies to the legacy Trains versions. For the latest documentation, see ClearML.
Integrate Trains with ignite. Use the ignite TrainsLogger
, and the handlers you can attach to it. See ignite's trains_logger.py.
To install Trains:
pip install trains
By default, Trains works with our demo Trains Server (https://demoapp.trains.allegro.ai/dashboard). You can deploy a self-hosted Trains Server, see the Deploying Trains Overview, and configure Trains to meet your requirements, see the Trains Configuration Reference page.
Ignite TrainsLogger
Integrate Trains by creating an Ignite TrainsLogger
object. When the code runs, it connects to the Trains backend, and creates a Task (experiment) in Trains.
from ignite.contrib.handlers.trains_logger import *
trains_logger = TrainsLogger(project_name="examples", task_name="ignite")
Later in the code, attach any of the Trains handlers to the TrainsLogger
object.
For example, attach the OutputHandler
and log training loss at each iteration:
trains_logger.attach(trainer,
log_handler=OutputHandler(tag="training",
output_transform=lambda loss: {"loss": loss}),
event_name=Events.ITERATION_COMPLETED)
TrainsLogger parameters
The TrainsLogger
method parameters are the following:
project_name
(optional[str]) – The name of the project in which the experiment will be created. If the project does not exist, it is created. Ifproject_name
isNone
, the repository name becomes the project name.task_name
(optional[str]) – The name of Task (experiment). Iftask_name
isNone
, the Python experiment script’s file name becomes the Task name.-
task_type
(optional[str]) – The name of the experiment. The default istraining
.The
task_type
values include:TaskTypes.training
(default)TaskTypes.train
TaskTypes.testing
TaskTypes.inference
-
report_freq
(optional[int]) – The histogram processing frequency (handles histogram values every X calls to the handler). AffectsGradsHistHandler
andWeightsHistHandler
. Default value is100
. histogram_update_freq_multiplier
(optional[int]) – The histogram report frequency (report first X histograms and once every X reports afterwards). Default value is10
.histogram_granularity
(optional[int]): Optional. Histogram sampling granularity. Default is50
.
Visualizing experiment results
After creating an Ignite TrainsLogger
object and attaching handlers in trains_logger.py, when the code runs, you can visualize the experiment results in the Trains Web-App (UI).
Scalars
For example, run the Ignite MNIST example for TrainsLogger, mnist_with_trains_logger.py.
To log scalars, use OutputHandler
.
trains_logger.attach(
trainer,
log_handler=OutputHandler(
tag="training", output_transform=lambda loss: {"batchloss": loss}, metric_names="all"
),
event_name=Events.ITERATION_COMPLETED(every=100),
)
trains_logger.attach(
train_evaluator,
log_handler=OutputHandler(tag="training", metric_names=["loss", "accuracy"],
another_engine=trainer),
event_name=Events.EPOCH_COMPLETED,
)
View the scalars in the Trains Web-App (UI), RESULTS tab, SCALARS sub-tab, view training and validation metrics.
trains_logger.attach(
validation_evaluator,
log_handler=OutputHandler(tag="validation", metric_names=["loss", "accuracy"],
another_engine=trainer),
event_name=Events.EPOCH_COMPLETED,
)
Model snapshots
To save model snapshots, use TrainsServer
.
handler = Checkpoint(
{"model": model},
TrainsSaver(trains_logger, dirname="~/.trains/cache/"),
n_saved=1,
score_function=lambda e: 123,
score_name="acc",
filename_prefix="best",
global_step_transform=global_step_from_engine(trainer),
)
View saved snapshots in the Trains Web-App (UI), ARTIFACTS tab.
To view the model, in the ARTIFACTS tab, click the model name (or download it).
Logging
Ignite engine output and / or metrics
To log the Ignite engine's output and / or metrics, use the OutputHandler
handler.
For example, log training loss at each iteration.
# Attach the logger to the trainer to log training loss at each iteration
trains_logger.attach(trainer,
log_handler=OutputHandler(tag="training",
output_transform=lambda loss: {"loss": loss}),
event_name=Events.ITERATION_COMPLETED)
Log metrics for training.
# Attach the logger to the evaluator on the training dataset and log NLL, Accuracy metrics after each epoch
# We setup `global_step_transform=global_step_from_engine(trainer)` to take the epoch
# of the `trainer` instead of `train_evaluator`.
trains_logger.attach(train_evaluator,
log_handler=OutputHandler(tag="training",
metric_names=["nll", "accuracy"],
global_step_transform=global_step_from_engine(trainer)),
event_name=Events.EPOCH_COMPLETED)
Log metrics for validation.
# Attach the logger to the evaluator on the validation dataset and log NLL, Accuracy metrics after
# each epoch. We setup `global_step_transform=global_step_from_engine(trainer)` to take the epoch of the
# `trainer` instead of `evaluator`.
trains_logger.attach(evaluator,
log_handler=OutputHandler(tag="validation",
metric_names=["nll", "accuracy"],
global_step_transform=global_step_from_engine(trainer)),
event_name=Events.EPOCH_COMPLETED)
Optimizer parameters
To log optimizer parameters, use the OptimizerParamsHandler
handler.
# Attach the logger to the trainer to log optimizer's parameters, e.g. learning rate at each iteration
trains_logger.attach(trainer,
log_handler=OptimizerParamsHandler(optimizer),
event_name=Events.ITERATION_STARTED)
Model weights
To log model weights as scalars, use the WeightsScalarHandler
handler.
# Attach the logger to the trainer to log model's weights norm after each iteration
trains_logger.attach(trainer,
log_handler=WeightsScalarHandler(model, reduction=torch.norm),
event_name=Events.ITERATION_COMPLETED)
To log model weights as histograms, use the WeightsHistHandler
handler.
# Attach the logger to the trainer to log model's weights norm after each iteration
trains_logger.attach(trainer,
log_handler=WeightsHistHandler(model),
event_name=Events.ITERATION_COMPLETED)
Model snapshots
To save input snapshots as Trains artifacts, use TrainsSaver
.
to_save = {"model": model}
handler = Checkpoint(to_save, TrainsSaver(trains_logger), n_saved=1,
score_function=lambda e: 123, score_name="acc",
filename_prefix="best",
global_step_transform=global_step_from_engine(trainer))
validation_evaluator.add_event_handler(Events.EVENT_COMPLETED, handler)
MNIST example
The ignite
repository contains an MNIST TrainsLogger example, mnist_with_trains_logger.py.
When you run this code, visualize the experiment results in the Trains Web-App (UI), see Visualizing experiment results.