Installing and Configuring Trains Agent
Trains is now ClearML
This documentation applies to the legacy Trains versions. For the latest documentation, see ClearML.
Install Trains Agent and then configure it, unless you use the demo Trains Server (https://demoapp.trains.allegro.ai/dashboard, and do not want to configure any of the Trains Agent options (e.g., Git, package manager, worker, or Docker settings). If you do not use the demo server, the Trains Agent configuration includes Trains Server web, API, and file store host URLs and Trains credentials, as well as Git credentials.
We provide a Trains Agent configuration wizard. If you previously configured a self-hosted Trains Server, then a configuration file already exists; add Trains Agent settings to it.
Once installed, Trains Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default cache folder in Linux is ~/.trains
.
Installing Trains Agent
To install Trains Agent:
Execute the following command:
pip install trains-agent
Configuring Trains Agent
Initializing a new Trains configuration file
To initialize a new Trains configuration file:
- Open a terminal session in Linux or a command prompt session in Microsoft Windows.
-
In your terminal session, run the setup wizard.
trains-agent init
If the setup wizard's response indicates you already have a configuration file, you must add Trains Agent settings to it. The wizard does not edit or overwrite existing configuration files.
The setup wizard prompts for your Trains credentials.
TRAINS-AGENT setup process Please create new trains credentials through the profile page in your trains web app (e.g. https://demoapp.trains.allegro.ai/profile) In the profile page, press "Create new credentials", then press "Copy to clipboard". Paste copied configuration here:
-
At the command prompt
Paste copied configuration here:
, copy and paste the Trains credentials you create as follows:-
Open the Trains Web-App (UI) in your browser.
-
Click the PROFILE page.
-
Click Create new credentials.
-
Click Copy to clipboard.
-
In your terminal session, paste your credentials and press Enter. The setup wizard confirms the credentials.
Detected credentials key="********************" secret="*******"
-
-
Enter the Trains Server web server URL, or press Enter to accept the default which is detected from your credentials.
You must use a secure protocol, https. Do not use http.
WEB Host configured to: [https://demoapp.trains.allegro.ai]
-
Enter the Trains Server API server URL, or press Enter to accept the default value which is based on your previous response:
API Host configured to: [https://demoapi.trains.allegro.ai]
-
Enter the Trains Server file server URL, or press Enter to accept the default value which is based on your previous response:
File Store Host configured to: [https://demofiles.trains.allegro.ai]
The wizard responds with your configuration and directs you to your Trains Server.
TRAINS Hosts configuration: Web App: https://demoapp.trains.allegro.ai API: https://demoapi.trains.allegro.ai File Store: https://demofiles.trains.allegro.ai Verifying credentials ... Credentials verified!
-
Enter your Git user name. Leave blank for SSH key authentication.
Enter git username for repository cloning (leave blank for SSH key authentication): []
-
Enter your Git password.
Enter password for user '<username>':
The setup wizard confirms your git credentials.
Git repository cloning will be using user=<username> password=<password>
-
Enter an additional artifact repository, or press Enter if not required.
Enter additional artifact repository (extra-index-url) to use when installing python packages (leave blank if not required):
The setup wizard completes.
New configuration stored in /home/<username>/trains.conf TRAINS-AGENT setup completed successfully.
Your configuration file is saved. Its location depends upon your operating system:
- Linux -
~/trains.conf
- Mac -
$HOME/trains.conf
- Windows -
\User\<username>\trains.conf
Adding Trains Agent settings to a Trains configuration file
To add Trains Agent settings to a Trains configuration file:
If a Trains configuration file already exists, edit it and add the Trains Agent settings, including Git credentials, and the agent
section.
An example configuration file is in the trains
repository.
To edit a Trains configuration file:
-
Open your Trains configuration file for editing. Depending upon your operating system, it is:
- Linux -
~/trains.conf
- Mac -
$HOME/trains.conf
- Windows -
\User\<username>\trains.conf
- Linux -
-
After the
api
section, add the following for your Git credentials, and an additional artifact repository.# Set GIT user/pass credentials # leave blank for GIT SSH credentials agent.git_user="<git_username>" agent.git_pass="<git_password>" # extra_index_url: ["https://allegroai.jfrog.io/trainsai/api/pypi/public/simple"] agent.package_manager.extra_index_url= [ ]
-
After the Git credentials (see the previous step), add the following
agent
section:agent { # unique name of this worker, if None, created based on hostname:process_id # Override with os environment: TRAINS_WORKER_ID # worker_id: "trains-agent-machine1:gpu0" worker_id: "" # worker name, replaces the hostname when creating a unique name for this worker # Override with os environment: TRAINS_WORKER_NAME # worker_name: "trains-agent-machine1" worker_name: "" # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https) # leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol) # git_user: "" # git_pass: "" # Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank) force_git_ssh_protocol: false # Set the python version to use when creating the virtual environment and launching the experiment # Example values: "/usr/bin/python3" or "/usr/local/bin/python3.6" # The default is the python executing the trains_agent python_binary: "" # select python package manager: # currently supported pip and conda # poetry is used if pip selected and repository contains poetry.lock file package_manager: { # supported options: pip, conda, poetry type: pip, # specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version) pip_version: "<20.2", # virtual environment inheres packages from system system_site_packages: false, # install with --upgrade force_upgrade: false, # additional artifact repositories to use when installing python packages # extra_index_url: ["https://allegroai.jfrog.io/trainsai/api/pypi/public/simple"] # additional conda channels to use when installing with conda package manager conda_channels: ["defaults", "conda-forge", "pytorch", ] # set to True to support torch nightly build installation, # notice: torch nightly builds are ephemeral and are deleted from time to time torch_nightly: false, }, # target folder for virtual environments builds, created when executing experiment venvs_dir = ~/.trains/venvs-builds # cached git clone folder vcs_cache: { enabled: true, path: ~/.trains/vcs-cache }, # use venv-update in order to accelerate python virtual environment building # Still in beta, turned off by default venv_update: { enabled: false, }, # cached folder for specific python package download (used for pytorch package caching) pip_download_cache { enabled: true, path: ~/.trains/pip-download-cache }, translate_ssh: true, # reload configuration file every daemon execution reload_config: false, # pip cache folder mapped into docker, used for python package caching docker_pip_cache = ~/.trains/pip-cache # apt cache folder mapped into docker, used for ubuntu package caching docker_apt_cache = ~/.trains/apt-cache # optional arguments to pass to docker image # these are local for this agent and will not be updated in the experiment's docker_cmd section # extra_docker_arguments: ["--ipc=host", ] # optional shell script to run in docker when started before the experiment is started # extra_docker_shell_script: ["apt-get install -y bindfs", ] # set to true in order to force "docker pull" before running an experiment using a docker image. # This makes sure the docker image is updated. docker_force_pull: false default_docker: { # default docker image to use when running in docker mode image: "nvidia/cuda:10.1-runtime-ubuntu18.04" # optional arguments to pass to docker image # arguments: ["--ipc=host", ] } # set the initial bash script to execute at the startup of any docker. # all lines will be executed regardless of their exit code. # {python_single_digit} is translated to 'python3' or 'python2' according to requested python version # docker_init_bash_script = [ # "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean", # "chown -R root /root/.cache/pip", # "apt-get update", # "apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0", # "(which {python_single_digit} && {python_single_digit} -m pip --version) || apt-get install -y {python_single_digit}-pip", # ] # cuda versions used for solving pytorch wheel packages # should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION # cuda_version: 10.1 # cudnn_version: 7.6 }
-
In the
sdk.storage.cache
section, add thesize
.size { # max_used_bytes = -1 min_free_bytes = 10GB # cleanup_margin_percent = 5% }
For example:
sdk { # TRAINS - default SDK configuration storage { cache { # Defaults to system temp folder / cache default_base_dir: "~/.trains/cache" size { # max_used_bytes = -1 min_free_bytes = 10GB # cleanup_margin_percent = 5% } }
-
Save your configuration.
Next Steps
- See the Tuning Experiments tutorial to learn how to use Trains Agent and Task remote execution to manage your experimentation.
- See the Trains Agent Use Case Examples and Trains Agent Reference pages.