Explicit Reporting

In this tutorial, you learn how to extend Trains automagical capturing of inputs and outputs.

To demonstrate explicit reporting, we add the following to one of the example scripts from our trains repository, pytorch_mnist.py:

  • A default output destination which is a storage location for models and other artifacts.
  • Explicit logging of a scalar plot, a plot of other (not scalar) data, and log messages.
  • Registration of dynamically synchronized (updated, read-write) artifacts in Trains.
  • The upload static (one-time, read-only) artifacts.

Prerequisites

Before you begin

Make a copy of pytorch_mnist.py so that you can add explicit reporting to it.

  • In your local trains repository, example directory.
    cp pytorch_mnist.py pytorch_mnist_tutorial.py

Step 1. A default output location for model snapshots and artifacts

A default output location allows you to specify where model snapshots and artifacts will be stored when the experiment runs. You can use a local destination, a shared folder, and cloud storage, such as S3 EC2, Google Cloud Storage, and Azure Storage. Specify the output location in the Task.init() method, output_uri parameter. In this tutorial, we specify a local folder destination.

In pytorch_mnist_tutorial.py, change the code from:

task = Task.init(project_name='examples', task_name='pytorch mnist train')

to:

model_snapshots_path = '/mnt/trains'
if not os.path.exists(model_snapshots_path):
    os.makedirs(model_snapshots_path)

task = Task.init(project_name='examples', 
    task_name='extending automagical Trains example', 
    output_uri=model_snapshots_path)

When the script runs, Trains creates the following directory structure:

+-- <output destination name>
|   +-- <project name>
|       +-- <task name>.<Task Id>
|           +-- models
|           +-- artifacts

and puts the model snapshots and artifacts in that folder.

For example, if the Task ID is 9ed78536b91a44fbb3cc7a006128c1b0, then the directory structure will be:

.
+-- model_snapshots
|   +-- examples
|       +-- extending automagical Trains example.9ed78536b91a44fbb3cc7a006128c1b0
|           +-- models
|           +-- artifacts

Step 2. Explicit logging of scalars, other plots, and messages

In addition to Trains automagical logging, the Trains Python package contains methods for explicit reporting of plots, log messages, media, and tables. These methods include:

Get a logger

First, first create a logger for the Task using the Task.get_logger() method.

logger = task.get_logger()

Plot scalar metrics

Add scalar metrics using the Logger.report_scalar() method to report loss metrics.

The code we added to train for the scalar metrics is highlighted by background color.

def train(args, model, device, train_loader, optimizer, epoch):

    save_loss = []

    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()

        save_loss.append(loss)

        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))

# Add manual scalar reporting for loss metrics logger.report_scalar(title='Scalar example {} - epoch'.format(epoch), series='Loss', value=loss.item(), iteration=batch_idx)

Plot other (not scalar) data

Our script contains a function named test which determines loss and correct for the trained model. We add a histogram and confusion matrix to log them.

The code added to test for the loss and correct is highlighted by background color.

def test(args, model, device, test_loader):

    save_test_loss = []
    save_correct = []

    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            # sum up batch loss
            test_loss += F.nll_loss(output, target, reduction='sum').item()
            # get the index of the max log-probability
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

            save_test_loss.append(test_loss)
            save_correct.append(correct)

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

logger.report_histogram(title='Histogram example', series='correct', iteration=1, values=save_correct, xaxis='Test', yaxis='Correct') # Manually report test loss and correct as a confusion matrix matrix = np.array([save_test_loss, save_correct]) logger.report_confusion_matrix(title='Confusion matrix example', series='Test loss / correct', matrix=matrix, iteration=1)

Log messages

You can extend Trains by logging explicit messages which include text, errors, warnings, and debugging statements. We use the Logger.report_text() method and its argument level to report the a debugging message.

logger.report_text('The default output destination for model snapshots and artifacts is: {}'.format(model_snapshots_path ), level=logging.DEBUG)

Your script is ready!

You can now run pytorch_mnist_tutorial.py To view the results, see the instructions in Additional information.

Step 3. Dynamic artifacts

Dynamic artifacts (as compared with static artifacts, see the next step) are registered in Trains and, once registered, synchronized with the Trains backend. If a registered artifact changes in an experiment, the change is updated on the backend.

Currently, we support Pandas DataFrames as registered artifacts.

Register the artifact

In our tutorial script, test function, we can assign the test loss and correct data to a Pandas DataFrame object and register that Pandas DataFrame using the Task.register_artifact() method.

# Create the Pandas DataFrame
test_loss_correct = {
        'test lost': save_test_loss,
        'correct': save_correct
}
df = pd.DataFrame(test_loss_correct, columns=['test lost','correct'])

# Register the test loss and correct as a Pandas DataFrame artifact
task.register_artifact('Test_Loss_Correct', df, metadata={'metadata string': 'apple', 
    'metadata int': 100, 'metadata dict': {'dict string': 'pear', 'dict int': 200}})

Reference the registered artifact

Once an artifact is registered, you can reference it in your Python experiment script and work with it.

In our tutorial script, we add Task.current_task() and Task.get_registered_artifacts() methods to take a sample.

# Once the artifact is registered, we can get it and work with it. Here, we sample it.
sample = Task.current_task().get_registered_artifacts()['Test_Loss_Correct'].sample(frac=0.5, 
    replace=True, random_state=1)

Step 4. Static artifacts

Static artifacts are one-time uploads to Trains, including:

  • Pandas DataFrames
  • Files of any type, including image files
  • Folders - stored as ZIP files
  • Images - stored as PNG files
  • Dictionaries - stored as JSONs
  • Numpy arrays - stored as NPZ files

In our tutorial script, we upload the loss data as a static artifact using the Task.upload_artifact() method with metadata specified in the metadata parameter.

# Upload test loss as a static artifact. Here, the static artifact is numpy array
task.upload_artifact('Predictions',artifact_object=np.array(save_test_loss),
    metadata={'metadata string': 'banana', 'metadata integer': 300,
    'metadata dictionary': {'dict string': 'orange', 'dict int': 400}})

Additional information

After extending the Python experiment script, we can run it and view the results in the Trains Web-App.

Run the script

python pytorch_mnist_tutorial.py

To view the experiment results, do the following:

  1. In the Trains Web-App, on the Projects page, click the examples project.

    The Trains demo Web-App

    If you are using the demo Trains Server, the Projects page is https://demoapp.trains.allegro.ai/projects. If you deployed a self-hosted Trains server, open it in your browser.

  2. In the experiments table, click the Extending automagical Trains example experiment.
  3. In the ARTIFACTS tab, DATA AUDIT section, click Test_Loss_Correct. The registered Pandas DataFrame appears, including the file path, size, hash, metadata, and a preview.
  4. In the OTHER section, click Loss. The uploaded numpy array appears, including its related information.
  5. Click the RESULTS tab.
  6. Click the LOG sub-tab. You can see the debugging message showing the Pandas DataFrame sample.
  7. Click the SCALARS sub-tab. You can see the scalar plots for epoch logging loss.
  8. Click the PLOTS sub-tab. You can see the confusion matrix and histogram.

Next Steps