Executing Experiments Remotely

Experiment remote execution in Trains allows you to automatically execute experiments on a single machine, or multiple remote machines. Add any experiment whose status is Draft to a queue, and a worker listening to that queue will execute it.

On this page, we explain how to remotely execute experiments using the Trains Web-App (UI). You can also remotely execute experiments programmatically, see the Hyperparameter optimization and Task pipelining examples, and refer to the Trains Python Client page reference.

Trains provides several ways to use remote execution, which can be used for multiple workflows, including:

  • Rerun an existing experiment, with or without changes.
    • For example, to rerun an experiment for more iterations on a machine with greater resources.
  • Reproduce an experiment, by creating an exact copy of it, and not modifying the copy.
    • For example, to replicate an experiment that previously ran on a different machine.
  • Tune an experiment, by creating an exact copy of it, and then modifying parts of the copy.
    • For example, to tweak an experiment, and then compare it to other experiments.

Requirements

Before executing experiments remotely, you need a running worker, and it must be listening to a queue. You may setup Trains Agent, or a DevOps or other IT group may set it up for you.

To set up Trains Agent and for information about Trains Agent commands, see the Installing and Configuring Trains Agent page, Trains Agent Use Case Examples, and Trains Agent Reference.

Rerunning experiments

Rerun an existing experiment, with or without changes. This does not create a new experiment in Trains, however it does overwrite the existing Task object in Trains Server. To not overwrite the existing object, see Reproducing experiments and Tuning experiments.

To rerun an experiment:

  1. On the Projects page, click the project card or the All projects card.

    The project page appears showing the experiments table which contains all active experiments in the project (some inactive experiments may be in the archive).

  2. In the experiment table, right click the experiment > Reset > RESET. The experiment's state (status) becomes Draft.

  3. If you want to make changes to the experiment (for example, select different source code, or tune the hyperparameters), see Modifying experiments.

  4. Right click the experiment > Enqueue > Select a queue > ENQUEUE.

You can track the reproduced experiment and compare it other experiments while it is running and after it completes.

Reproducing experiments

Reproduce an existing experiment, by creating an exact copy of it, and not modifying the copy. This creates a new experiment in Trains, and a new Task object in Trains Server. In Trains, we call this cloning. To modify a cloned experiment, see Tuning experiments.

  1. On the Projects page, click the project card or the All projects card.

    The project page appears showing the experiments table which contains all active experiments in the project (some inactive experiments may be in the archive).

  2. In the experiment table, right click the experiment > Clone.

  3. Select a project, type a new for the newly cloned experiment, and optionally type a description.

  4. Click CLONE. The newly cloned experiment's detail pane appears. The experiment's status is Draft.

  5. Right click the experiment > Enqueue > Select a queue > ENQUEUE.

Tuning experiments

Tune an existing experiment, by creating an exact copy of it, and then modifying parts of the copy. This creates a new experiment, and a new Task object in Trains.

  1. On the Projects page, click the project card or the All projects card.

    The project page appears showing the experiments table which contains all active experiments in the project (some inactive experiments may be in the archive).

  2. In the experiment table, right click the experiment > Clone.

  3. Select a project, type a new for the newly cloned experiment, and optionally type a description.

  4. Click CLONE. The newly cloned experiment's detail pane appears. The experiment's status is Draft.

  5. Make changes to the experiment (for example, select different source code, change the hyperparameters, select a new initial weights input model, or other editable experiments components), see Modifying experiments.

  6. Right click the experiment > Enqueue > Select a queue > ENQUEUE.

Modifying experiments

Experiments whose status is Draft are editable (see the user properties exception). In the Trains Web (UI), edit experiments in the info panel, including:

Execution details

Source code

Select source code by changing any of the following:

  • Repository, commit (select by ID, tag name, or choose the last commit in the branch), script, and /or working directory.
  • Installed Python packages and / or versions - Edit or clear (remove) them all.
  • Uncommitted changes - Edit or clear (remove) them all.

To select different source code:

  • In the EXECUTION tab, hover over a section > EDIT or (DISCARD DIFFS for UNCOMMITTED CHANGES) > edit > SAVE.

Base Docker image

Select a pre-configured Docker that Trains Agent will use to remotely execute this experiment (see the Building Docker containers).

To add, change, or delete a base Docker image:

  • In EXECUTION > AGENT CONFIGURATION > BASE DOCKER IMAGE > hover > EDIT > Enter the base Docker image.

Output destination

Set an output destination for model checkpoints (snapshots) and other artifacts. Examples of supported types of destinations and formats for specifying locations include:

  • A shared folder: /mnt/share/folder
  • S3: s3://bucket/folder
  • Google Cloud Storage: gs://bucket-name/folder
  • Azure Storage: azure://company.blob.core.windows.net/folder/

To add, change, or delete an artifact output destination:

  • In EXECUTION > OUTPUT > DESTINATION > hover > EDIT > edit > SAVE.

Also set the output destination for artifacts in code (see the output_uri parameter of the Task.init method), and in the Trains configuration file for all experiments (see default_output_uri on the Trains Configuration Reference page).

Log level

Set a logging level for the experiment (see the standard Python logging levels.

To add, change, or delete a log level:

  • In EXECUTION > OUTPUT > LOG LEVEL > hover > EDIT > Enter the log level.

Configuration

Hyperparameters

In older versions of Trains Server, the CONFIGURATION tab was named HYPER PARAMETERS, and it contained all parameters. The renamed tab contains a HYPER PARAMETER section, and subsections for hyperparameter groups.

Add, change, or delete hyperparameters, which in the Trains Web (UI) are organized in the following groups:

  • Command line arguments and all older experiments parameters, except TensorFlow definitions - In the Args section (from code, argparse argument automatic logging).

  • TensorFlow definitions - In the TF_DEFINE section (from code, TF_DEFINEs automatic logging).

  • Parameter dictionaries - In the General section (from code, connected to the Task by calling the Task.connect) method (see the connecting a dict object)).

  • Environment variables - Tracked if you set the TRAINS_LOG_ENVIRONMENT environment variable, see this FAQ). In the Environment section.

  • Custom named parameter groups (see the name parameter in Task.connect).

To add, change, or delete hyperparameters:

  • In the CONFIGURATIONS tab > HYPER PARAMETERS > General > hover > EDIT > add, change, or delete keys and /or values > SAVE.

User properties

User properties allow you to store any descriptive information in key-value pair format. They are editable in any experiment, except experiments whose status is Published (read-only).

To add, change, or delete user properties:

  • In CONFIGURATIONS > USER PROPERTIES > Properties > hover > EDIT > add, change, or delete keys and /or values > SAVE.

Configuration objects

In older versions of Trains Server, the Task model configuration appeared in the ARTIFACTS tab, MODEL CONFIGURATION section. Task model configurations now appear in CONFIGURATION > Configuration Objects.

Edit the experiment (Task) model configurations.

To add, change, or delete the Task model configurations:

  • In CONFIGURATIONS > CONFIGURATION OBJECTS > GENERAL > hover > EDIT or CLEAR (if the configuration is not empty).

Artifacts

Initial weights input model

Choose an initial input weight model for the same project or any other project. Or, if the experiment has an input model associated with it, you can remove it from the experient (this does not delete the model from Trains).

To add, change, or delete the initial weights input model:

  • In ARTIFACTS > Input Model > Hover over the right side model area EDIT:

    • If the experiment has an input model associated with it, click to select a model, or to delete the model.
    • If the experiment does not have an input model, select a model in the dialog that appears.

The model design is not editable in the experiment. It is editable in the model details.

Terminating running experiments

To terminate a running experiment, abort it. For example, if the experiment requires changes.

To terminate an experiment:

  • In the experiment table, right click the experiment > Abort > ABORT. The experiment status changes to Aborted. The Trains Web-App (UI) shows the results to that point, including the last iteration performed.

Read-only experiments (Publishing)

When you want to prevent changes to an experiment, make it read-only by publishing it.

  • In the experiments table, right click the experiment > Click Publish. The status changes to Published.