Trains Configuration Reference

This page details the configurable options for Trains and Trains Agent. Both Trains and Trains Agent use the same configuration file trains.conf which, depending upon your operating system, is located one of the following:

  • Linux - ~/trains.conf
  • Mac - $HOME/trains.conf
  • Windows - \User\<username>\trains.conf

Configuration options are organized by section (dictionary) in the configuration file, including the:

  • agent section - Trains Agent configuration.
  • api section - Trains Server configuration and credentials.
  • sdk section - Trains configuration for Trains Python Client Package and related options, including storage, metrics, network, AWS S3 buckets and credentials, Google Cloud Storage, Azure Storage, log, and development.

Example configuration files are in the following repositories:

  • The trains repository - This trains.conf example does not contain an agent section, because it is for Trains which can run without Trains Agent.
  • The trains-agent repository - This trains.conf example does contain an agent section, because it is for Trains Agent.

Why is the same configuration file used for Trains and Trains Agent?

Trains and Trains Agent both use the api and sdk sections, as well as options for Trains credentials. A single configuration file avoids duplication and makes it easier for you to find the options you require.

Editing the configuration file

To add, change, or delete options, edit the configuration file.

To edit the Trains configuration file

  1. Open your configuration file for editing, depending upon your operating system:
    • Linux - ~/trains.conf
    • Mac - $HOME/trains.conf
    • Windows - \User\<username>\trains.conf
  2. In the required section (sections listed on this page), add, modify, or remove your required options.
  3. Save the configuration file.

agent section

The agent section contains options to configure Trains Agent for Git credentials, package managers, cache management, workers, and Docker for workers.

Option Description
agent
dict
Dictionary of top-level Trains Agent options.
agent.cuda_version
float
The CUDA version to use.
  • If specified, this is the CUDA version used.
  • If not specified, the CUDA version is automatically detected.
Alternatively, override this option with the environment variable CUDA_VERSION.
agent.cudnn_version
float
The cuDNN version to use.
  • If specified, this is the cuDNN version used.
  • If not specified, the cuDNN version is automatically detected.
Alternatively, override this option with the environment variable CUDNN_VERSION.
agent.docker_apt_cache
string
The apt (Linux package tool) cache folder for mapping Ubuntu package caching into Docker.
agent.docker_pip_cache
string
The pip (Python package tool) cache folder for mapping Python package caching into Docker.
agent.git_pass
string
Git repository password.
  • If using Git SSH credentials, do not specify this option.
  • If not using Git SSH credentials, use this option to specify a Git password for cloning your repositories.
agent.git_user
string
Git repository username.
  • If using Git SSH credentials, do not specify this option.
  • If not using Git SSH credentials, use this option to specify a Git password for cloning your repositories.
agent.reload_config
bool
Indicates whether to reload the configuration each time the worker daemon is executed. The values are:
  • true
  • false
agent.translate_ssh
bool
agent.venvs_dir
string
The target folder for virtual environments builds which are created when executing experiment.
agent.worker_id
string
When creating a worker, assign the worker a name.
  • If specified, a unique name for the worker. For example, trains-agent-machine1:gpu0.
  • If not specified, the following is used <hostname>:<process_id>. For example, MyHost:12345.
Alternatively, specify the environment variable TRAINS_WORKER_ID to override this worker name.
agent.worker_name
string
Use to replace the hostname when creating a worker, if agent.worker_id is not specified. For example, if worker_name is MyMachine and the process_id is 12345, then the worker is name MyMachine.12345. Alternatively, specify the environment variable TRAINS_WORKER_NAME to override this worker name.
agent.default_docker
dict
Dictionary containing the default options for workers in Docker mode.
agent.default_docker.arguments
string
If running a worker in Docker mode, this option specifies the options to pass to the Docker container.
agent.default_docker.image
string
If running a worker in Docker mode, this option specifies the default Docker image to use.
agent.package_manager
dict
Dictionary containing the options for the Python package manager. The currently supported package managers are pip, conda, and if the repository contains a poetry.lock file, poetry.
agent.package_manager.conda_channels
list of string
If conda is used, then this is list of conda channels to use when installing Python packages.
agent.package_manager.extra_index_url
list of string
A list of URLs for additional artifact repositories when installing Python packages.
agent.package_manager.force_upgrade
bool
Indicates whether to force an upgrade of Python packages. The values are:
  • true - Force
  • false - Do not force
agent.package_manager
.system_site_packages
bool
Indicates whether Python packages for virtual environments are inherited from the system when building a virtual environment for an experiment. The values are:
  • true - Inherit
  • false - Do not inherit (load Python packages)
agent.package_manager.type
string
Indicates the type of Python package manager to use. The values are:
  • pip - use pip as the package manager or, if a poetry.lock file exists in the repository, use poetry as the package manager
  • conda - use conda as the package manager
agent.pip_download_cache
dict
Dictionary containing pip download cache options.
agent.pip_download_cache.enabled
bool
Indicates whether to use a specific cache folder for Python package downloads. The values are:
  • true - Use a specific folder which is specified in the option agent.pip_download_cache.path
  • false - Do not use a specific folder.
agent.pip_download_cache.path
string
If agent.pip_download_cache.enabled is true, then this specifies the cache folder.
agent.vcs_cache
dict
Dictionary containing version control system clone cache folder.
agent.vcs_cache.enabled
bool
Indicates whether the version control system cache is used. The values are:
  • true - Use cache
  • false - Do not use cache
agent.vcs_cache.path
string
The version control system cache clone folder when executing experiments.
agent.venv_update
dict
Dictionary containing virtual environment update options.
agent.venv_update.enabled
bool
Indicates whether to use accelerated Python virtual environment building (this is a beta feature). The values are:
  • true - Accelerate
  • false - Do not accelerate (default value)

api section

The api section contains configuration options for the Trains Server API, web, and file servers and credentials.

Option Description
api.api_server
string
The URL of your Trains API server. For example, https://api.MyDomain.com.
api.web_server
string
The URL of your Trains web server. For example, https://app.MyDomain.com.
api.files_server
string
The URL of your Trains file server. For example, https://files.MyDomain.com.

You must use a secure protocol.

For api.web_server, api.files_server, and api.files_server You must use a secure protocol, "https". Do not use "http".

api.credentials
dict
Dictionary of API credentials.
api.credentials.access_key
string
Your Trains access key.
api.credentials.secret_key
string
Your Trains credentials.

sdk section

The sdk section contains configuration options for the Trains Python Client Package and related options, including storage, metrics, network, aws S3 buckets and credentials, Google Cloud Storage, Azure Storage, log, and development.

Option Description
sdk.aws.boto3
dict
Dictionary of AWS Storage, Boto2 options.
sdk.aws.boto3.pool_connections
integer
For AWS Boto3, The maximum number of Boto3 pool connections.
sdk.aws.boto3
.max_multipart_concurrency
integer
For AWS Boto3, the maximum number of threads making requests for a transfer.
sdk.aws.s3
dict
Dictionary of AWS Storage, S3 options.
sdk.aws.s3.key
string
For AWS S3, the default access key for any bucket that is not specified in the sdk.aws.s3.credentials section.
sdk.aws.s3.region
string
For AWS S3, the default region name for any bucket that is not specified in the sdk.aws.s3.credentials section.
sdk.aws.s3.secret
string
For AWS S3, the default secret access key for any bucket that is not specified in the sdk.aws.s3.credentials section.
sdk.aws.s3.credentials
list of dict
List of dictionaries, for AWS S3, each dictionary can contain the credentials for individual S3 buckets or hosts for individual buckets.
sdk.aws.s3.credentials.bucket
string
For AWS S3, if specifying credentials for individual buckets, then this is the bucket name for an individual bucket.
sdk.aws.s3.credentials.host
string
For AWS S3, if specifying credentials for individual buckets by host, then this option is the host URL and optionally the port number.
sdk.aws.s3.credentials.key
string
For AWS S3:
  • If specifying individual bucket, then this is the access key for the bucket.
  • If specifying individual buckets by host, then this is access key for all buckets on the host.
sdk.aws.s3.credentials.multipart
bool
For AWS S3, if specifying credentials for individual buckets by host, then this indicates whether to allow multipart upload of a single object (object as a set of parts). The values are:
  • true - Enabled
  • false - Disabled
sdk.aws.s3.credentials.secret
bool
    For AWS S3:
  • If specifying credentials for a specific bucket, then this is the secret key for the bucket.
  • If specifying credentials for individual buckets by host, then this is the secret key for all buckets on the host.
sdk.aws.s3.credentials.secure
string
For AWS S3, if specifying credentials for individual buckets by host, then this indicates whether the host is secure. The values are:
  • true - Secure
  • false - Not secure
sdk.azure.storage.containers
list of dict
List of dictionaries, each dictionary contains credentials for an Azure Storage container.
sdk.azure.storage.containers
.account_key
string
For Azure Storage, this is the credentials key.
sdk.azure.storage.containers
.account_name
string
For Azure Storage, this is account name.
sdk.azure.storage.containers
.container_name
string
For Azure Storage, this the container name.
sdk.development
dict
Dictionary of development mode options.
sdk.development
.store_uncommitted_code_diff_on_train
bool
For development mode, indicates whether to store the uncommitted git diff or hg diff in the experiment manifest The values are:
  • true - Store the diff in the script.requirements.diff section
  • false - Do not store the diff.
sdk.development.support_stopping
bool
For development mode, indicates whether to allow stopping an experiment if the experiment was aborted externally, its status was changed, or it was reset. The values are:
  • true - Allow
  • false - Do not allow
sdk.development
.task_reuse_time_window_in_hours
float
For development mode, the number of hours after which an experiment with the same project name and experiment name is reused. This setting allows you to control reuse of old experiments.
sdk.development.vcs_repo_detect_async
bool
For development mode, indicates whether to run version control repository detection asynchronously. The values are:
  • true - Run asynchronously
  • false - Do not run asynchronously
sdk.development.worker
dict
Dictionary of development mode options for workers.
sdk.development.worker.log_stdout
bool
For development mode workers, indicates whether all stdout and stderr messages are logged. The values are:
  • True - Log all
  • false - Do not log all
sdk.development.worker.ping_period_sec
integer
For development mode workers, the interval in seconds for a worker to ping the server testing connectivity.
sdk.development.worker.report_period_sec
integer
For development mode workers, the interval in seconds for a development mode Trains worker to report.
sdk.google.storage
dict
Dictionary of Google Cloud Storage credentials.
sdk.google.storage.project
string
For Google Cloud Storage, the name of project.
sdk.google.storage.credentials_json
string
For Google Cloud Storage, the file path for the default Google storage credentials JSON file.
sdk.google.storage.credentials.bucket
string
For Google Cloud Storage, if specifying credentials by the individual bucket, the name of the bucket.
sdk.google.storage.credentials
.credentials_json
string
For Google Cloud Storage, if specifying credentials by the individual bucket, the file path for the default Google storage credentials JSON file.
sdk.google.storage.credentials.project
string
For Google Cloud Storage, if specifying credentials by the individual bucket, the name of the project.
sdk.google.storage.credentials.subdir
string
For Google Cloud Storage, if specifying credentials by the individual bucket, a subdirectory within the bucket.
sdk.log
dict
Dictionary of log options.
sdk.log.disable_urllib3_info
bool
Indicates whether to disable urllib3 info messages. The values are:
  • true - Disable
  • false - Do not disable
sdk.log.null_log_propagate
bool
As debugging feature, indicates whether to allow null log messages to propagate yo the root logger (so they appear as stdout). The values are:
  • true - Allow
  • false - Do not allow
sdk.log.task_log_buffer_capacity
integer
The maximum capacity of the log buffer.
sdk.metrics
dict
Dictionary of metrics options.
sdk.metrics.file_history_size
string
The history size for debug files per metric / variant combination. For each metric / variant combination, file_history_size indicates the number of files stored in the upload destination. Files are recycled so that file_history_size is the maximum number of files at any time.
sdk.metrics.images
dict
Dictionary of metrics images options.
sdk.metrics.images.format
string
The image file format for generated debug images (e.g., JPEG).
sdk.metrics.images.quality
integer
The image quality for generated debug images.
sdk.metrics.images.subsampling
integer
The image subsampling for generated debug images.
sdk.network.iteration
dict
Dictionary of network iteration options.
sdk.network.iteration
.max_retries_on_server_error
integer
For retries when getting frames from the server, if the server returned an error (http code 500), then this is the maximum number of retries.
sdk.network.iteration
.retry_backoff_factor_sec
For retries when getting frames from the server, this is backoff factor for consecutive retry attempts. This is used to determine the number of seconds between retries. The retry backoff factor is calculated as {backoff factor} * (2 ^ ({number of total retries} - 1)).
sdk.network.metrics
dict
Dictionary of network metrics options.
sdk.network.metrics
.file_upload_starvation_warning_sec
integer
The number of seconds before a warning is issued that file-bearing events are sent for upload, but no uploads occur.
sdk.network.metrics.file_upload_threads
integer
The number of threads allocated to uploading files when transmitting metrics for a specific iteration.
sdk.storage.cache
dict
Dictionary of storage cache options.
sdk.storage.cache.path_substitution
.local_prefix
string
Local directory structure.
sdk.storage.cache.path_substitution
.registered_prefix
string
Use to replace the prefix of a registered local path with the prefix matching the local directory structure. This is a list of dictionaries and during a lookup, the first match executes. The Windows path separator must be escaped ("\\").

The replacement is text-based. The replacement ignores logical parts of a path.

For example, the rule:

{
   registered_prefix: "/opt/mydir"
   local_prefix: "/tmp/data"
}
evaluates the path /opt/mydirnew/hello.txt by matching it to the path /tmp/datanew/hello.txt.

sdk.storage.cache.path_substitution
.replace_linux_sep
string
Indicates whether to enable path separator for Linux. The values are:
  • true - Enable
  • false - Disable
sdk.storage.cache.path_substitution
.replace_windows_sep
string
Indicates whether to enable path separator for Windows. The values are:
  • true - Enable
  • false - Disable

You cannot set both replace_linux_sep and replace_windows_sep to True.

If both are set to True, an exception is raised.

sdk.storage.cache.default_base_dir
string
The default base directory for caching. The default is the system temp folder for caching.
sdk.storage.cache.size
.cleanup_margin_percent
integer
The percentage of cache to clean up during a cleanup pass. For example, if the cache size is 30G and the cleanup_margin_percent is 10%, then the cache will contain at most 27GB after the cleanup.
sdk.storage.cache.size.min_free_bytes
integer
The minimum cache drive size (GB) free space. For no minimum, use 0 or a negative number.
sdk.storage.cache.size.max_used_bytes
integer
The maximum size (GB) of a file to cache. For no limit, use 0 or a negative number.
sdk.storage.direct_access
dict
Dictionary of storage direct access options.
sdk.storage.direct_access.url
string
Specify a list of direct access objects using glob patterns which matches sets of files using wildcards. Direct access objects are not downloaded or cached.