PyTorch Lightning

Integrate Trains into the PyTorch code you organize with pytorch-lightning. Use the PyTorch Lightning TrainsLogger module. Also see the PyTorch Lightning Trains Module documentation.

Installing and configuring Trains

Install Trains using pip.

pip install trains

By default, Trains works with our demo Trains Server (https://demoapp.trains.allegro.ai/dashboard). You can deploy a self-hosted Trains Server, see Deploying Trains Overview. Additionally, you can configure Trains to meet your requirements, see the Trains Configuration Reference page.

TrainsLogger

Integrate Trains by creating a TrainsLogger object. When your code runs, it connects to the Trains backend, creates a Task (experiment) in Trains, and logging is automatic.

from pytorch_lightning import Trainer
from pytorch_lightning.loggers import TrainsLogger

trains_logger = TrainsLogger(project_name='pytorch lightning', task_name='default')

Later in your code:

  • Call any PyTorch Lightning TrainsLogger method.
    trains_logger.<TrainsLogger_method>
  • Call any Trains Python Client package Trains method.
    trains_logger.experiment.<Trains_method>

TrainsLogger parameters

The TrainsLogger method parameters are the following:

  • project_name (optional[str]) – The name of the project in which the experiment will be created. If the project does not exist, it is created. If project_name is None, the repository name becomes the project name.
  • task_name (optional[str]) – The name of Task (experiment). If task_name is None, the Python experiment script’s file name becomes the Task name.
  • task_type (optional[str]) – The name of the experiment. The default is training.
  • reuse_last_task_id (optional[bool]) – Force a new Task (experiment) with a new Task Id, but the same project and Task names.
  • output_uri (optional[str]) – The default location for output models and other artifacts. In the default location, Trains creates a subfolder for the output. The subfolder structure is the following:
    <output destination name> / <project name> / <task name>.<Task Id>

    The following are examples of output_uri values for the supported locations:

    • A shared folder: /mnt/share/folder
    • S3: s3://bucket/folder
    • Google Cloud Storage: gs://bucket-name/folder
    • Azure Storage: azure://company.blob.core.windows.net/folder/
  • auto_connect_arg_parser (optional[bool]) – Automatically connect an argparse object to the Task?
  • auto_connect_frameworks (optional[bool]) – Automatically connect frameworks? This includes patching MatplotLib, XGBoost, scikit-learn, Keras callbacks, and TensorBoard/X to serialize plots, graphs, and the model location to the Trains Server (backend), in addition to original output destination.
  • auto_resource_monitoring (optional[bool]) – Automatically create machine resource monitoring plots? These plots appear in in the Trains Web-App (UI), RESULTS tab, SCALARS sub-tab, with a title of :resource monitor:.

TrainsLogger methods

Log metrics

To log scalars and create plots, use TrainsLogger.log_metrics().

trains_logger.log_metrics({"metrics_example/val_loss": 2}, step=0)
trains_logger.log_metrics({"metrics_example/val_loss": 1.5}, step=1)
trains_logger.log_metrics({"metrics_example/val_loss": 1.25}, step=2)
trains_logger.log_metrics({"metrics_example/val_loss": 1}, step=3)

In Trains Web-App (UI), view the metrics in the experiment results SCALARS tab.

Log text message

To print text messages to the log and console, use TrainsLogger.log_text().

trains_logger.log_text("sample test")

In Trains Web-App (UI), view the text message logging in the experiment results LOGS tab.

Log images

To report an image and upload its contents to a preconfigured bucket (see output_uri), use TrainsLogger.log_image().

trains_logger.log_image('title','series','samples/dancing.jpg')

m = np.random.randint(0, 255, (200, 150, 3))
trains_logger.log_image('title','series', m)

In Trains Web-App (UI), view the uploaded images in the experiment results DEBUG SAMPLES tab.

Debug images are clickable and open an image viewer.

Artifacts

To store artifacts with experiments, use TrainsLogger.log_artifact().

In Trains Web-App (UI), view the artifacts in the experiment results ARTIFACTS tab, which includes the file path, file size, hash, and a clickable link to copy file URLs to the clipboard, or open HTML links in new browser tabs.

Pandas DataFrames

df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, 2, 1, 8]},
                  index=['falcon', 'dog', 'spider', 'fish'])

# Register Pandas object as artifact to watch
# (it will be monitored in the background and automatically synced and uploaded)
trains_logger.log_artifact('train', df, metadata={'counting': 'legs', 'max legs': 69})

Local files

# add and upload local file artifact
trains_logger.log_artifact('local file', 'samples/dancing.jpg')
# add and upload a link
trains_logger.log_artifact('HTML link', 'https://allegro.ai/')

Dictionaries

# add and upload dictionary stored as JSON
trains_logger.log_artifact('dictionary', df.to_dict())

Numpy arrays

# add and upload Numpy Object (stored as .npz file)
trains_logger.log_artifact('Numpy Eye', np.eye(100, 100))

Images

# add and upload Image (stored as .png file)
im = Image.open('samples/dancing.jpg')
trains_logger.log_artifact('pillow_image', im)

Folders

# add and upload a folder, artifact_object should be the folder path
trains_logger.log_artifact('local folder', 'samples/')

Use wildcards to select files in a folder to store as artifacts.

# add and upload a wildcard
trains_logger.log_artifact('local folder', 'samples/*.jpg')

Track hyperparameters

Use TrainsLogger.log_hyperparams() to track hyperparameters, passing a parameter dictionary as the argument.

# Create a dictionary of parameters
parameters_dict = { 'learning_rate': 0.001, 'batch_size': 100, 'display_step': 1,
    'model_path': "/tmp/model.ckpt", 'n_hidden_1': 256, 'n_hidden_2': 256 }

trains_logger.log_hyperparams(parameters_dict)

Trains methods and TrainsLogger

To use any Trains methods, chain the PyTorch Lightning experiment property to the TrainsLogger object, and then chain a Trains method or property.

# Chaining the trains method 
trains_logger.experiment.<trains_method>

For example, call Trains explicit reporting methods, and Task execution scheduling.

Example: Explicit Reporting

For example, a confusion matrix.

confusion = np.random.randint(10, size=(10, 10))

trains_logger.experiment.get_logger().report_surface(title="example confusion matrix", 
    series="series", iteration=1, matrix=confusion)

In Trains Web-App (UI), view the confusion matrices in the experiment results PLOTS tab.

Example: Task scheduling

To use the Trains methods for Task cloning and enqueuing to a worker using Trains Agent.

For example, if a trains-agent worker daemon is running and listening to the MyQueue queue, clone the Task (experiment) whose ID is 098d39a7508d4d9faa2b1b7fbb6e9ee3 and enqueue the cloned Task to run.

cloned_task = trains_logger.experiment.clone(source_task='098d39a7508d4d9faa2b1b7fbb6e9ee3',
    name='cloned_task')

trains_logger.experiment.enqueue(task=cloned_task,queue_name='MyQueue')

When this code runs, a Task (experiment) named cloned_task is created, and enqueued. The trains-agent daemon listening to the MyQueue queue fetches the cloned Task, and executes it.