Artifacts Examples

The example script artifacts_toy.py in the trains repository demonstrates working with artifacts in Trains. This includes artifacts which are registered and dynamically synchronized with the Trains platform as the script runs, and static artifacts which are one-time uploads.

To specify a centralized storage location for artifacts, use the output_uri parameter when calling the Task.init method, and specify a folder or cloud.

In the Trains Web-App (UI), view artifacts in the experiment results ARTIFACTS tab, which includes the file path, file size, hash, and a clickable link to copy file URLs to the clipboard, or open HTML links in new browser tabs.

Pre-populated examples ready to enqueue

A self-hosted Trains Server installs with the example experiments located in the trains repository, examples folder

Dynamic artifacts

To upload artifacts which are dynamically synchronized with the Trains backend, use the Task.register_artifact method. Data audition is the primary use of registered artifacts.

Pandas DataFrames

Currently, Trains supports registering Pandas DataFrames as dynamically synchronized artifacts.

For example, create a Pandas DataFrame and register it.

df = pd.DataFrame({'num_legs': [2, 4, 8, 0], 'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, 2, 1, 8]}, 
                  index=['falcon', 'dog', 'spider', 'fish'])

task.register_artifact('train', df, metadata={'counting': 'legs', 'max legs': 69})

When the script runs, Trains uploads the Pandas DataFrame. In the Trains Web-App (UI), view it in the ARTIFACTS tab, including the URL of the stored artifact, and clickable links for HTML artifacts, which open in a new browser tab.

You can change it and call the Task.get_registered_artifacts method to work with it.

df.sample(frac=0.5, replace=True, random_state=1)

Task.current_task().get_registered_artifacts()['train'].sample(frac=0.5, replace=True, random_state=1)

Static artifacts

To upload static artifacts, use the Task.upload_artifact method.

Pandas DataFrames

Store Pandas DataFrames as static, as well as, dynamic artifacts.

# add and upload pandas.DataFrame (onetime snapshot of the object)
task.upload_artifact('Pandas', artifact_object=df)

Dictionaries

Store dictionaries. Trains stores them as JSON files.

# add and upload dictionary stored as JSON)
task.upload_artifact('dictionary', df.to_dict())

Numpy objects

Store Numpy objects. Trains stores them as NPZ files.

# add and upload Numpy Object (stored as .npz file)
task.upload_artifact('Numpy Eye', np.eye(100, 100))

Image files

Store image files. Trains stores them as PNG files.

# add and upload Image (stored as .png file)
im = Image.open('samples/dancing.jpg')
task.upload_artifact('pillow_image', im)

Local files

Store local files.

# add and upload local file artifact
task.upload_artifact('local file', artifact_object='samples/dancing.jpg')

Store HTML links.

# add and upload a link
task.upload_artifact('HTML link', artifact_object='https://allegro.ai/')

Folders

Store a folder, and the files it contains.

# add and upload a folder, artifact_object should be the folder path
task.upload_artifact('local folder', artifact_object='samples/')

Use wildcards to store selected files in a folder.

# add and upload a wildcard
task.upload_artifact('local folder', artifact_object='samples/*.jpg')

Centralized storage

To storage artifacts in centralized storage, use the output_uri parameter in the Task.init method.

The following are examples of output_uri values for the supported locations:

  • A shared folder: /mnt/share/folder
  • S3: s3://bucket/folder
  • Google Cloud Storage: gs://bucket-name/folder
  • Azure Storage: azure://company.blob.core.windows.net/folder/

Credentials for the storage location are in the Trains configuration file (for example, on Linux, ~/trains.conf).

An upload creates the following folder structure in the destination location.

For example:

task = Task.init('examples', 'store artifacts in centralized storage', output_uri='/mnt/data')

When the script runs, Trains creates the following in /mnt/data:

|   +-- <project-name>
|       +-- <experiment-name>.<Task-Id>
|           +-- artifacts
|               +-- <artifact name>
|               +-- <artifact name>