Explicit Reporting

In this tutorial, you learn how to extend Trains automagical capturing of inputs, outputs, logging, source code control, resource monitoring, and other deep learning solutions features by adding explicit reporting.

To demonstrate explicit reporting, we extend one of the example scripts from our trains repository, pytorch_mnist.py, by adding the following:

  • A default output destination for the experiment.
  • Explicit logging of a scalar metric, plots of other (not scalar) data, and debug messages.
  • Registered artifacts which are dynamically synchronized with Trains.
  • Uploaded artifacts which are static, one-time uploads.
  • Model snapshots stored in the default output destination we specify.

Prerequisites

Before you begin

Make a copy of the example script we use in this tutorial, explicit_reporting_tutorial.py, so that we can extend it.

  1. In your local trains repository, change to the example directory.
  2. Make a copy of pytorch_mnist.py and name the copy pytorch_mnist_tutorial.py.

Step 1. Set a default output location

A default output location allows you specify where model snapshots and artifacts will be stored when the experiment runs. You can use a local destination, a shared folder, and cloud storage, such as S3 EC2, Google Cloud Storage, and Azure Storage. In this tutorial, we use a local destination which is directory in your local repository's examples directory.

In pytorch_mnist_tutorial.py, add the output_uri parameter to the Task.init() method specifying a default destination for the experiment's output. Also, change the task_name parameter's value so that this tutorial example is easier to find in the Trains Web-App.

Change the code from:

task = Task.init(project_name='examples', task_name='pytorch mnist train')

to:

model_snapshots_path = '/mnt/trains'
if not os.path.exists(model_snapshots_path):
    os.makedirs(model_snapshots_path)

task = Task.init(project_name='examples', 
    task_name='extending automagical Trains example', 
    output_uri=model_snapshots_path)

When the script runs, Trains creates the following directory structure in the output_uri location you specify:

.
+-- <output destination name>
|   +-- <project name>
|       +-- <task name>.<Task Id>

For our tutorial script, the directory structure will contain models and artifacts. For example, if the Task Id is 9ed78536b91a44fbb3cc7a006128c1b0, then the directory structure will be:

.
+-- model_snapshots
|   +-- examples
|       +-- extending automagical Trains example.9ed78536b91a44fbb3cc7a006128c1b0
|           +-- models
|           +-- artifacts

Step 2. Get a logger

In later steps, we will log scalar metrics, plot data, and log message. For this, we must first create a logger for the Task so that the logging is reported to the Trains Web-App. Get a logger using the Task.get_logger() method.

logger = task.get_logger()

Step 3. Plot scalar metrics

Add scalar metrics to your Python experiment scripts, in addition to those automagically captured by Trains to improve your experiment analysis and comparison.

Our script contains a function named train that performs training for each epoch. We use the Logger.report_scalar() method to report loss metrics.

The code added to train for the scalar metrics is highlighted by background color.

def train(args, model, device, train_loader, optimizer, epoch):

    save_loss = []

    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()

        save_loss.append(loss)

        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))

# Add manual scalar reporting for loss metrics logger.report_scalar(title='Scalar example {} - epoch'.format(epoch), series='Loss', value=loss.item(), iteration=batch_idx)

Step 4. Plot any other data

In addition to adding explicitly logged scalar metrics (see the previous step), the Logger class contains methods to plot any data in a variety of chart types. The following are the chart types you can use:

Our script contains a function named test which determines loss and correct for the trained model. We add a histogram and confusion matrix to log them. The code added to test for the loss and correct is highlighted by background color.

def test(args, model, device, test_loader):

    save_test_loss = []
    save_correct = []

    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            # sum up batch loss
            test_loss += F.nll_loss(output, target, reduction='sum').item()
            # get the index of the max log-probability
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

            save_test_loss.append(test_loss)
            save_correct.append(correct)

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

logger.report_histogram(title='Histogram example', series='correct', iteration=1, values=save_correct, xaxis='Test', yaxis='Correct') # Manually report test loss and correct as a confusion matrix matrix = np.array([save_test_loss, save_correct]) logger.report_confusion_matrix(title='Confusion matrix example', series='Test loss / correct', matrix=matrix, iteration=1)

Step 5. Register a dynamically artifact

Store artifacts with your experiments in Trains. Artifacts, in addition to models which are also artifacts, include the following:

  • Pandas DataFrames - dynamically updated DataFrames and one-time snapshots of DataFrames
  • Files of any type, including image files
  • Folders - stored as ZIP files
  • Images - stored as PNG files
  • Dictionaries - stored as JSONs
  • Numpy arrays - stored as NPZ files
  • Objects of any other type that you require

In our script's test function, we can assign the test loss and correct data to a Pandas DataFrame object and register that Pandas DataFrame using the Task.register_artifact() method so that changes to it are dynamically synchronized with Trains. The metadata parameter allows you to store a dictionary of metadata key-value pairs with the artifact.

# Create the Pandas DataFrame
test_loss_correct = {
        'test lost': save_test_loss,
        'correct': save_correct
}
df = pd.DataFrame(test_loss_correct, columns=['test lost','correct'])

# Register the test loss and correct as a Pandas DataFrame artifact
task.register_artifact('Test_Loss_Correct', df, metadata={'metadata string': 'apple', 
    'metadata int': 100, 'metadata dict': {'dict string': 'pear', 'dict int': 200}})

Step 7. Reference the registered artifact

Once an artifact is registered, you can reference it in your Python experiment script and work with it.

The Task.current_task() and Task.get_registered_artifacts() methods allow you to create a reference to the registered Pandas DataFrame. We then create a sample from it.

# Once the artifact is registered, we can get it and work with it. Here, we sample it.
sample = Task.current_task().get_registered_artifacts()['Test_Loss_Correct'].sample(frac=0.5, 
    replace=True, random_state=1)

Step 7. Upload a static artifact

We can also upload the loss data as a static artifact using the Task.upload_artifact() method and its metadata parameter.

# Upload test loss as a static artifact. Here, the static artifact is numpy array
task.upload_artifact('Predictions',artifact_object=np.array(save_test_loss),
    metadata={'metadata string': 'banana', 'metadata integer': 300,
    'metadata dictionary': {'dict string': 'orange', 'dict int': 400}})

Step 8. Log an explicit message

You can extend Trains bylogging explicit messages which include text, errors, warnings, and debugging statements. We use the Logger.report_text() method and its argument level to report the Pandas DataFrame sample from the previous step.

# Use explicit reporting to log the Pandas DataFrame sample as a debugging message
logger.report_text('Sample\n{}'.format(sample), level=logging.DEBUG)

Your script is ready!

You can now run pytorch_mnist_tutorial.py To view the results, see the instructions in Additional information.

Additional information

After extending the Python experiment script, we can run it and view the results in the Trains Web-App.

Run the script

python pytorch_mnist_tutorial.py

To view the experiment results:

  1. In the Trains Web-App, on the Projects page, click the examples project.

    The Trains demo Web-App

    If you are using our Trains demo server, the Projects page is https://demoapp.trains.allegro.ai/projects. If you deployed your own locally-hosted Trains server, open it in your browser.

  2. In the experiments table, click the Extending automagical Trains example experiment.
  3. In the ARTIFACTS tab, DATA AUDIT section, click Test_Loss_Correct. The registered Pandas DataFrame appears, including the file path, size, hash, metadata, and a preview.
  4. In the OTHER section, click Loss. The uploaded numpy array appears, including its related information.
  5. Click the RESULTS tab.
  6. Click the LOG sub-tab. You can see the debugging message showing the Pandas DataFrame sample.
  7. Click the SCALARS sub-tab. You can see the scalar plots for epoch logging loss.
  8. Click the PLOTS sub-tab. You can see the confusion matrix and histogram.

Next Steps