Explicit Reporting

In this tutorial, you learn how to extend Trains automagical capturing of inputs and outputs with explicit reporting. We add the following to one of the example scripts from our Trains repository, pytorch_mnist.py:

  • Setting an output destination for model checkpoints (snapshots).
  • Explicitly logging a scalar, other (not scalar) data, and logging text.
  • Registering an artifact, which is uploaded to Trains Server, and Trains logs changes to it.
  • Uploading an artifact, which is uploaded, but changes to it are not logged.

Prerequisites

  • The trains repository is cloned.
  • The trains package is installed.

Before you begin

Make a copy of pytorch_mnist.py so that you can add explicit reporting to it.

  • In your local Trains repository, example directory.
    cp pytorch_mnist.py pytorch_mnist_tutorial.py
    

Step 1. Setting an output destination for model checkpoints

A default output location allows you to specify where model checkpoints (snapshots) and artifacts will be stored when the experiment runs. You can use a local destination, a shared folder, and cloud storage, such as S3 EC2, Google Cloud Storage, and Azure Storage. Specify the output location in the Task.init method, output_uri parameter. In this tutorial, we specify a local folder destination.

In pytorch_mnist_tutorial.py, change the code from:

task = Task.init(project_name='examples', task_name='pytorch mnist train')

to:

model_snapshots_path = '/mnt/trains'
if not os.path.exists(model_snapshots_path):
    os.makedirs(model_snapshots_path)

task = Task.init(project_name='examples', 
    task_name='extending automagical Trains example', 
    output_uri=model_snapshots_path)

When the script runs, Trains creates the following directory structure:

+ - <output destination name>
|   +-- <project name>
|       +-- <task name>.<Task Id>
|           +-- models
|           +-- artifacts

and puts the model checkpoints (snapshots) and artifacts in that folder.

For example, if the Task ID is 9ed78536b91a44fbb3cc7a006128c1b0, then the directory structure will be:

+ - model_snapshots
|   +-- examples
|       +-- extending automagical Trains example.9ed78536b91a44fbb3cc7a006128c1b0
|           +-- models
|           +-- artifacts

Step 2. Logger class reporting methods

In addition to Trains automagical logging, the Trains Python package contains methods for explicit reporting of plots, log text, media, and tables. These methods include:

Get a logger

First, first create a logger for the Task using the Task.get_logger method.

logger = task.get_logger

Plot scalar metrics

Add scalar metrics using the Logger.report_scalar method to report loss metrics.

def train(args, model, device, train_loader, optimizer, epoch):

    save_loss = []

    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()

        save_loss.append(loss)

        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))
            # Add manual scalar reporting for loss metrics
            logger.report_scalar(title='Scalar example {} - epoch'.format(epoch), 
                series='Loss', value=loss.item(), iteration=batch_idx)

Plot other (not scalar) data

Our script contains a function named test which determines loss and correct for the trained model. We add a histogram and confusion matrix to log them.

def test(args, model, device, test_loader):

    save_test_loss = []
    save_correct = []

    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            # sum up batch loss
            test_loss += F.nll_loss(output, target, reduction='sum').item()
            # get the index of the max log-probability
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

            save_test_loss.append(test_loss)
            save_correct.append(correct)

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

    logger.report_histogram(title='Histogram example', series='correct',
        iteration=1, values=save_correct, xaxis='Test', yaxis='Correct')

    # Manually report test loss and correct as a confusion matrix
    matrix = np.array([save_test_loss, save_correct])
    logger.report_confusion_matrix(title='Confusion matrix example', 
        series='Test loss / correct', matrix=matrix, iteration=1)

Log text

You can extend Trains by explicitly logging text, including errors, warnings, and debugging statements. We use the Logger.report_text method and its argument level to report the a debugging message.

logger.report_text('The default output destination for model snapshots and artifacts is: {}'.format(model_snapshots_path ), level=logging.DEBUG)

Your script is ready!

You can now run pytorch_mnist_tutorial.py To view the results, see the instructions in Additional information.

Step 3. Registering artifacts

Registering an artifact uploads it to Trains Server, and if it changes, the change is logged in Trains Server. Currently, Trains supports Pandas DataFrames as registered artifacts.

Register the artifact

In our tutorial script, test function, we can assign the test loss and correct data to a Pandas DataFrame object and register that Pandas DataFrame using the Task.register_artifact method.

# Create the Pandas DataFrame
test_loss_correct = {
        'test lost': save_test_loss,
        'correct': save_correct
}
df = pd.DataFrame(test_loss_correct, columns=['test lost','correct'])

# Register the test loss and correct as a Pandas DataFrame artifact
task.register_artifact('Test_Loss_Correct', df, metadata={'metadata string': 'apple', 
    'metadata int': 100, 'metadata dict': {'dict string': 'pear', 'dict int': 200}})

Reference the registered artifact

Once an artifact is registered, you can reference it in your Python experiment script and work with it.

In our tutorial script, we add Task.current_task and Task.get_registered_artifacts methods to take a sample.

# Once the artifact is registered, we can get it and work with it. Here, we sample it.
sample = Task.current_task().get_registered_artifacts()['Test_Loss_Correct'].sample(frac=0.5, 
    replace=True, random_state=1)

Step 4. Uploading artifacts

Uploading an artifact uploads it to Trains Server, but changes are not logged.

  • Pandas DataFrames
  • Files of any type, including image files
  • Folders - stored as ZIP files
  • Images - stored as PNG files
  • Dictionaries - stored as JSONs
  • Numpy arrays - stored as NPZ files

In our tutorial script, we upload the loss data as an artifact using the Task.upload_artifact method with metadata specified in the metadata parameter.

# Upload test loss as an artifact. Here, the artifact is numpy array
task.upload_artifact('Predictions',artifact_object=np.array(save_test_loss),
    metadata={'metadata string': 'banana', 'metadata integer': 300,
    'metadata dictionary': {'dict string': 'orange', 'dict int': 400}})

Additional information

After extending the Python experiment script, we can run it and view the results in the Trains Web-App (UI).

Run the script

python pytorch_mnist_tutorial.py

To view the experiment results, do the following:

  1. In the Trains Web-App (UI), on the Projects page, click the examples project.

    The Trains demo Web-App

    If you are using the demo Trains Server, the Projects page is https://demoapp.trains.allegro.ai/projects. If you deployed a self-hosted Trains server, open it in your browser.

  2. In the experiments table, click the Extending automagical Trains example experiment.

  3. In the ARTIFACTS tab, DATA AUDIT section, click Test_Loss_Correct. The registered Pandas DataFrame appears, including the file path, size, hash, metadata, and a preview.
  4. In the OTHER section, click Loss. The uploaded numpy array appears, including its related information.
  5. Click the RESULTS tab.
  6. Click the LOG sub-tab. You can see the debugging message showing the Pandas DataFrame sample.
  7. Click the SCALARS sub-tab. You can see the scalar plots for epoch logging loss.
  8. Click the PLOTS sub-tab. You can see the confusion matrix and histogram.

Next Steps