PyTorch Lightning

Integrate Trains into the PyTorch code you organize with pytorch-lightning. Use the PyTorch Lightning TrainsLogger module. Also, see the PyTorch Lightning Trains Module documentation.

To install Trains:

pip install trains

By default, Trains works with our demo Trains Server (https://demoapp.trains.allegro.ai/dashboard). You can deploy a self-hosted Trains Server, see the Deploying Trains Overview, and configure Trains to meet your requirements, see the Trains Configuration Reference page.

TrainsLogger

Integrate Trains by creating a TrainsLogger object. When your code runs, it connects to the Trains backend, creates a Task (experiment) in Trains, and logging is automatic.

from pytorch_lightning import Trainer
from pytorch_lightning.loggers import TrainsLogger

trains_logger = TrainsLogger(project_name='pytorch lightning', task_name='default')

Later in your code:

  • Call any PyTorch Lightning TrainsLogger method.

    trains_logger.<TrainsLogger_method>
    
  • Call any Trains Python Client package Trains method.

    trains_logger.experiment.<Trains_method>
    

TrainsLogger parameters

The TrainsLogger method parameters are the following:

  • project_name (optional[str]) – The name of the project in which the experiment will be created. If the project does not exist, it is created. If project_name is None, the repository name becomes the project name.
  • task_name (optional[str]) – The name of Task (experiment). If task_name is None, the Python experiment script’s file name becomes the Task name.
  • task_type (optional[str]) – The name of the experiment. The default is training.
  • reuse_last_task_id (optional[bool]) – Force a new Task (experiment) with a new Task Id, but the same project and Task names.
  • output_uri (optional[str]) – The default location for output models and other artifacts. In the default location, Trains creates a subfolder for the output. The subfolder structure is the following:

    <output destination name> / <project name> / <task name>.<Task Id>
    

    The following are examples of output_uri values for the supported locations:

    • A shared folder: /mnt/share/folder
    • S3: s3://bucket/folder
    • Google Cloud Storage: gs://bucket-name/folder
    • Azure Storage: azure://company.blob.core.windows.net/folder/
  • auto_connect_arg_parser (optional[bool]) – Automatically connect an argparse object to the Task?

  • auto_connect_frameworks (optional[bool]) – Automatically connect frameworks? This includes patching MatplotLib, XGBoost, scikit-learn, Keras callbacks, and TensorBoard/X to serialize plots, graphs, and the model location to the Trains Server (backend), in addition to original output destination.

  • auto_resource_monitoring (optional[bool]) – Automatically create machine resource monitoring plots? These plots appear in the Trains Web-App (UI), RESULTS tab, SCALARS sub-tab, with a title of :resource monitor:.

TrainsLogger methods

Log metrics

To log scalars and create plots, use TrainsLogger.log_metrics.

trains_logger.log_metrics({"metrics_example/val_loss": 2}, step=0)
trains_logger.log_metrics({"metrics_example/val_loss": 1.5}, step=1)
trains_logger.log_metrics({"metrics_example/val_loss": 1.25}, step=2)
trains_logger.log_metrics({"metrics_example/val_loss": 1}, step=3)

In Trains Web-App (UI), view the metrics in the experiment results SCALARS tab.

image

Log text message

To print text messages to the log and console, use TrainsLogger.log_text.

trains_logger.log_text("sample test")

In Trains Web-App (UI), view the text message logging in the experiment results LOGS tab.

image

Log images

To report an image and upload its contents to a preconfigured bucket (see output_uri), use TrainsLogger.log_image.

trains_logger.log_image('title','series','samples/dancing.jpg')

m = np.random.randint(0, 255, (200, 150, 3))
trains_logger.log_image('title','series', m)

In Trains Web-App (UI), view the uploaded images in the experiment results DEBUG SAMPLES tab.

image

Debug images are clickable and open an image viewer.

image

Artifacts

To store artifacts with experiments, use TrainsLogger.log_artifact.

In Trains Web-App (UI), view the artifacts in the experiment results ARTIFACTS tab, which includes the file path, file size, hash, and a clickable link to copy file URLs to the clipboard, or open HTML links in new browser tabs.

Pandas DataFrames

df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
                    'num_wings': [2, 0, 0, 0],
                    'num_specimen_seen': [10, 2, 1, 8]},
                    index=['falcon', 'dog', 'spider', 'fish'])

# Register Pandas object as artifact to watch
# (it will be monitored in the background and automatically synced and uploaded)
trains_logger.log_artifact('train', df, metadata={'counting': 'legs', 'max legs': 69})

image

Local files

# add and upload local file artifact
trains_logger.log_artifact('local file', 'samples/dancing.jpg')
add and upload a link
trains_logger.log_artifact('HTML link', 'https://allegro.ai/')

Dictionaries

# add and upload dictionary stored as JSON
trains_logger.log_artifact('dictionary', df.to_dict())

Numpy arrays

# add and upload Numpy Object (stored as .npz file)
trains_logger.log_artifact('Numpy Eye', np.eye(100, 100))

Images

# add and upload Image (stored as .png file)
im = Image.open('samples/dancing.jpg')
trains_logger.log_artifact('pillow_image', im)

Folders

# add and upload a folder, artifact_object should be the folder path
trains_logger.log_artifact('local folder', 'samples/')

Use wildcards to select files in a folder to store as artifacts.

# add and upload a wildcard
trains_logger.log_artifact('local folder', 'samples/*.jpg')

Track hyperparameters

Use TrainsLogger.log_hyperparams to track hyperparameters, passing a parameter dictionary as the argument.

# Create a dictionary of parameters
parameters_dict = { 'learning_rate': 0.001, 'batch_size': 100, 'display_step': 1,
    'model_path': "/tmp/model.ckpt", 'n_hidden_1': 256, 'n_hidden_2': 256 }

trains_logger.log_hyperparams(parameters_dict)

image

Trains methods and TrainsLogger

To use any Trains methods, chain the PyTorch Lightning experiment property to the TrainsLogger object, and then chain a Trains method or property.

# Chaining the trains method 
trains_logger.experiment.<trains_method>

For example, call Trains explicit reporting methods (see the Explicit Reporting tutorial), and Task execution scheduling.

Example: Explicit Reporting

For example, a confusion matrix.

confusion = np.random.randint(10, size=(10, 10))

trains_logger.experiment.get_logger().report_surface(title="example confusion matrix", 
    series="series", iteration=1, matrix=confusion)

In Trains Web-App (UI), view the confusion matrices in the experiment results PLOTS tab.

image

Example: Task scheduling

To use the Trains methods for Task cloning and enqueuing to a worker using Trains Agent.

For example, if a trains-agent worker daemon is running and listening to the MyQueue queue, clone the Task (experiment) whose ID is 098d39a7508d4d9faa2b1b7fbb6e9ee3 and enqueue the cloned Task to run.

cloned_task = trains_logger.experiment.clone(source_task='098d39a7508d4d9faa2b1b7fbb6e9ee3',
    name='cloned_task')

trains_logger.experiment.enqueue(task=cloned_task,queue_name='MyQueue')

When this code runs, a Task (experiment) named cloned_task is created, and enqueued. The trains-agent daemon listening to the MyQueue queue fetches the cloned Task, and executes it.