Artifacts Reporting

The artifacts.py example demonstrates uploading objects (other than models) to storage as experiment artifacts. These artifacts include Pandas DataFrames, local files, dictionaries, folders, Numpy objects, image files, and folders. Artifacts can be uploaded and dynamically tracked, or uploaded without tracking.

Configure Trains for uploading artifacts to any of the supported types of storage, which include local and shared folders, S3 buckets, Google Cloud Storage, and Azure Storage (debug sample storage is different). Configure Trains in any of the following ways:

When the script runs, it creates an experiment named artifacts example, which is associated with the examples project.

Trains reports artifacts in the Trains Web (UI), experiment details, ARTIFACTS tab.

image

Dynamically tracked artifacts

Currently, Trains supports uploading and dynamically tracking Pandas DataFrames. Use the Task.register_artifact method. If the Pandas DataFrame changes, Trains uploads the changes. The updated artifact is associated with your experiment.

For example:

df = pd.DataFrame(
    {
        'num_legs': [2, 4, 8, 0],
        'num_wings': [2, 0, 0, 0],
        'num_specimen_seen': [10, 2, 1, 8]
    },
    index=['falcon', 'dog', 'spider', 'fish']
)

# Register Pandas object as artifact to watch
# (it will be monitored in the background and automatically synced and uploaded)
task.register_artifact('train', df, metadata={'counting': 'legs', 'max legs': 69}))

By changing the artifact, and calling the Task.get_registered_artifacts method to retrieve it, we can see that Trains tracked the change.

# change the artifact object
df.sample(frac=0.5, replace=True, random_state=1)
# or access it from anywhere using the Task's get_registered_artifacts()
Task.current_task().get_registered_artifacts()['train'].sample(frac=0.5, replace=True, random_state=1)

Artifacts without tracking

Trains supports several types of objects that you can upload which are not tracked. Use the Task.upload_artifact method.

Artifacts without tracking include:

  • Pandas DataFrames
  • Local files
  • Dictionaries (stored as a JSONs)
  • Numpy objects (stored as NPZ files)
  • Image files (stored as PNG files)
  • Folders (stored as a ZIP files)
  • Wildcards (stored as a ZIP files)

Pandas DataFrames

# add and upload pandas.DataFrame (onetime snapshot of the object)
task.upload_artifact('Pandas', artifact_object=df)

Local files

# add and upload local file artifact
task.upload_artifact('local file', artifact_object=os.path.join('data_samples', 'dancing.jpg'))

Dictionaries

# add and upload dictionary stored as JSON)
task.upload_artifact('dictionary', df.to_dict())

Numpy objects

# add and upload Numpy Object (stored as .npz file)
task.upload_artifact('Numpy Eye', np.eye(100, 100))

Image files

# add and upload Image (stored as .png file)
im = Image.open(os.path.join('data_samples', 'dancing.jpg'))
task.upload_artifact('pillow_image', im)

Folders

# add and upload a folder, artifact_object should be the folder path
task.upload_artifact('local folder', artifact_object=os.path.join('data_samples'))

Wildcards

# add and upload a wildcard
task.upload_artifact('wildcard jpegs', artifact_object=os.path.join('data_samples', '*.jpg'))