PyTorch Lightning

Integrate Trains into the PyTorch code you organize with pytorch-lightning. Use the PyTorch Lightning TrainsLogger module. Also see the PyTorch Lightning Trains Module documentation.

To install Trains:

pip install trains

By default, Trains works with our demo Trains Server (https://demoapp.trains.allegro.ai/dashboard). You can deploy a self-hosted Trains Server, see the Deploying Trains Overview, and configure Trains to meet your requirements, see the Trains Configuration Reference page.

TrainsLogger

Integrate Trains by creating a TrainsLogger object. When your code runs, it connects to the Trains backend, creates a Task (experiment) in Trains, and logging is automatic.

from pytorch_lightning import Trainer
from pytorch_lightning.loggers import TrainsLogger

trains_logger = TrainsLogger(project_name='pytorch lightning', task_name='default')

Later in your code:

  • Call any PyTorch Lightning TrainsLogger method.
    trains_logger.<TrainsLogger_method>
  • Call any Trains Python Client package Trains method.
    trains_logger.experiment.<Trains_method>

TrainsLogger parameters

The TrainsLogger method parameters are the following:

  • project_name (optional[str]) – The name of the project in which the experiment will be created. If the project does not exist, it is created. If project_name is None, the repository name becomes the project name.
  • task_name (optional[str]) – The name of Task (experiment). If task_name is None, the Python experiment script’s file name becomes the Task name.
  • task_type (optional[str]) – The name of the experiment. The default is training.
  • reuse_last_task_id (optional[bool]) – Force a new Task (experiment) with a new Task Id, but the same project and Task names.
  • output_uri (optional[str]) – The default location for output models and other artifacts. In the default location, Trains creates a subfolder for the output. The subfolder structure is the following:
    <output destination name> / <project name> / <task name>.<Task Id>

    The following are examples of output_uri values for the supported locations:

    • A shared folder: /mnt/share/folder
    • S3: s3://bucket/folder
    • Google Cloud Storage: gs://bucket-name/folder
    • Azure Storage: azure://company.blob.core.windows.net/folder/
  • auto_connect_arg_parser (optional[bool]) – Automatically connect an argparse object to the Task?
  • auto_connect_frameworks (optional[bool]) – Automatically connect frameworks? This includes patching MatplotLib, XGBoost, scikit-learn, Keras callbacks, and TensorBoard/X to serialize plots, graphs, and the model location to the Trains Server (backend), in addition to original output destination.
  • auto_resource_monitoring (optional[bool]) – Automatically create machine resource monitoring plots? These plots appear in the Trains Web-App (UI), RESULTS tab, SCALARS sub-tab, with a title of :resource monitor:.

TrainsLogger methods

Log metrics

To log scalars and create plots, use TrainsLogger.log_metrics.

trains_logger.log_metrics({"metrics_example/val_loss": 2}, step=0)
trains_logger.log_metrics({"metrics_example/val_loss": 1.5}, step=1)
trains_logger.log_metrics({"metrics_example/val_loss": 1.25}, step=2)
trains_logger.log_metrics({"metrics_example/val_loss": 1}, step=3)

In Trains Web-App (UI), view the metrics in the experiment results SCALARS tab.

Log text message

To print text messages to the log and console, use TrainsLogger.log_text.

trains_logger.log_text("sample test")

In Trains Web-App (UI), view the text message logging in the experiment results LOGS tab.

Log images

To report an image and upload its contents to a preconfigured bucket (see output_uri), use TrainsLogger.log_image.

trains_logger.log_image('title','series','samples/dancing.jpg')

m = np.random.randint(0, 255, (200, 150, 3))
trains_logger.log_image('title','series', m)

In Trains Web-App (UI), view the uploaded images in the experiment results DEBUG SAMPLES tab.

Debug images are clickable and open an image viewer.

Artifacts

To store artifacts with experiments, use TrainsLogger.log_artifact.

In Trains Web-App (UI), view the artifacts in the experiment results ARTIFACTS tab, which includes the file path, file size, hash, and a clickable link to copy file URLs to the clipboard, or open HTML links in new browser tabs.

Pandas DataFrames

df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, 2, 1, 8]},
                  index=['falcon', 'dog', 'spider', 'fish'])

# Register Pandas object as artifact to watch
# (it will be monitored in the background and automatically synced and uploaded)
trains_logger.log_artifact('train', df, metadata={'counting': 'legs', 'max legs': 69})

Local files

# add and upload local file artifact
trains_logger.log_artifact('local file', 'samples/dancing.jpg')
# add and upload a link
trains_logger.log_artifact('HTML link', 'https://allegro.ai/')

Dictionaries

# add and upload dictionary stored as JSON
trains_logger.log_artifact('dictionary', df.to_dict())

Numpy arrays

# add and upload Numpy Object (stored as .npz file)
trains_logger.log_artifact('Numpy Eye', np.eye(100, 100))

Images

# add and upload Image (stored as .png file)
im = Image.open('samples/dancing.jpg')
trains_logger.log_artifact('pillow_image', im)

Folders

# add and upload a folder, artifact_object should be the folder path
trains_logger.log_artifact('local folder', 'samples/')

Use wildcards to select files in a folder to store as artifacts.

# add and upload a wildcard
trains_logger.log_artifact('local folder', 'samples/*.jpg')

Track hyperparameters

Use TrainsLogger.log_hyperparams to track hyperparameters, passing a parameter dictionary as the argument.

# Create a dictionary of parameters
parameters_dict = { 'learning_rate': 0.001, 'batch_size': 100, 'display_step': 1,
    'model_path': "/tmp/model.ckpt", 'n_hidden_1': 256, 'n_hidden_2': 256 }

trains_logger.log_hyperparams(parameters_dict)

Trains methods and TrainsLogger

To use any Trains methods, chain the PyTorch Lightning experiment property to the TrainsLogger object, and then chain a Trains method or property.

# Chaining the trains method 
trains_logger.experiment.<trains_method>

For example, call Trains explicit reporting methods (see the Explicit Reporting tutorial), and Task execution scheduling.

Example: Explicit Reporting

For example, a confusion matrix.

confusion = np.random.randint(10, size=(10, 10))

trains_logger.experiment.get_logger().report_surface(title="example confusion matrix", 
    series="series", iteration=1, matrix=confusion)

In Trains Web-App (UI), view the confusion matrices in the experiment results PLOTS tab.

Example: Task scheduling

To use the Trains methods for Task cloning and enqueuing to a worker using Trains Agent.

For example, if a trains-agent worker daemon is running and listening to the MyQueue queue, clone the Task (experiment) whose ID is 098d39a7508d4d9faa2b1b7fbb6e9ee3 and enqueue the cloned Task to run.

cloned_task = trains_logger.experiment.clone(source_task='098d39a7508d4d9faa2b1b7fbb6e9ee3',
    name='cloned_task')

trains_logger.experiment.enqueue(task=cloned_task,queue_name='MyQueue')

When this code runs, a Task (experiment) named cloned_task is created, and enqueued. The trains-agent daemon listening to the MyQueue queue fetches the cloned Task, and executes it.