Installing and Configuring Trains Agent

Install Trains Agent and then configure it, unless you use the demo Trains Server (https://demoapp.trains.allegro.ai/dashboard, and do not want to configure any of the Trains Agent options (e.g., Git, package manager, worker, or Docker settings). If you do not use the demo server, the Trains Agent configuration includes Trains Server web, API, and file store host URLs and Trains credentials, as well as Git credentials.

We provide a Trains Agent configuration wizard. If you previously configured a self-hosted Trains Server, then a configuration file already exists; add Trains Agent settings to it.

Once installed, Trains Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default cache folder in Linux is ~/.trains.

Installing Trains Agent

To install Trains Agent:

Execute the following command:

pip install trains-agent

Configuring Trains Agent

Initializing a new Trains configuration file

To initialize a new Trains configuration file:

  1. Open a terminal session in Linux or a command prompt session in Microsoft Windows.
  2. In your terminal session, run the setup wizard.

    trains-agent init
    

    If the setup wizard's response indicates you already have a configuration file, you must add Trains Agent settings to it. The wizard does not edit or overwrite existing configuration files.

    The setup wizard prompts for your Trains credentials.

    TRAINS-AGENT setup process
    
    Please create new trains credentials through the profile page in your trains web app (e.g. https://demoapp.trains.allegro.ai/profile)
    In the profile page, press "Create new credentials", then press "Copy to clipboard".
    
    Paste copied configuration here:
    
  3. At the command prompt Paste copied configuration here:, copy and paste the Trains credentials you create as follows:

    1. Open the Trains Web-App (UI) in your browser.

    2. Click the PROFILE page.

    3. Click Create new credentials.

    4. Click Copy to clipboard.

    5. In your terminal session, paste your credentials and press Enter. The setup wizard confirms the credentials.

      Detected credentials key="********************" secret="*******"
      
  4. Enter the Trains Server web server URL, or press Enter to accept the default which is detected from your credentials.

    You must use a secure protocol, https. Do not use http.

    WEB Host configured to: [https://demoapp.trains.allegro.ai]
    
  5. Enter the Trains Server API server URL, or press Enter to accept the default value which is based on your previous response:

    API Host configured to: [https://demoapi.trains.allegro.ai]
    
  6. Enter the Trains Server file server URL, or press Enter to accept the default value which is based on your previous response:

    File Store Host configured to: [https://demofiles.trains.allegro.ai]
    

    The wizard responds with your configuration and directs you to your Trains Server.

    TRAINS Hosts configuration:
    Web App: https://demoapp.trains.allegro.ai
    API: https://demoapi.trains.allegro.ai
    File Store: https://demofiles.trains.allegro.ai
    
    Verifying credentials ...
    Credentials verified!
    
  7. Enter your Git user name. Leave blank for SSH key authentication.

    Enter git username for repository cloning (leave blank for SSH key authentication): []
    
  8. Enter your Git password.

    Enter password for user '<username>':
    

    The setup wizard confirms your git credentials.

    Git repository cloning will be using user=<username> password=<password>
    
  9. Enter an additional artifact repository, or press Enter if not required.

    Enter additional artifact repository (extra-index-url) to use when installing python packages (leave blank if not required):
    

    The setup wizard completes.

    New configuration stored in /home/<username>/trains.conf
    TRAINS-AGENT setup completed successfully.
    

Your configuration file is saved. Its location depends upon your operating system:

  • Linux - ~/trains.conf
  • Mac - $HOME/trains.conf
  • Windows - \User\<username>\trains.conf

Adding Trains Agent settings to a Trains configuration file

To add Trains Agent settings to a Trains configuration file:

If a Trains configuration file already exists, edit it and add the Trains Agent settings, including Git credentials, and the agent section.

An example configuration file is in the trains repository.

To edit a Trains configuration file:

  1. Open your Trains configuration file for editing. Depending upon your operating system, it is:

    • Linux - ~/trains.conf
    • Mac - $HOME/trains.conf
    • Windows - \User\<username>\trains.conf
  2. After the api section, add the following for your Git credentials, and an additional artifact repository.

    # Set GIT user/pass credentials
    # leave blank for GIT SSH credentials
    agent.git_user="<git_username>"
    agent.git_pass="<git_password>"
    
    # extra_index_url: ["https://allegroai.jfrog.io/trainsai/api/pypi/public/simple"]
    agent.package_manager.extra_index_url= [
    ]
    
  3. After the Git credentials (see the previous step), add the following agent section:

    agent {
        # unique name of this worker, if None, created based on hostname:process_id
        # Override with os environment: TRAINS_WORKER_ID
        # worker_id: "trains-agent-machine1:gpu0"
        worker_id: ""
    
        # worker name, replaces the hostname when creating a unique name for this worker
        # Override with os environment: TRAINS_WORKER_NAME
        # worker_name: "trains-agent-machine1"
        worker_name: ""
    
        # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
        # leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
        # git_user: ""
        # git_pass: ""
    
        # Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
        force_git_ssh_protocol: false
    
        # Set the python version to use when creating the virtual environment and launching the experiment
        # Example values: "/usr/bin/python3" or "/usr/local/bin/python3.6"
        # The default is the python executing the trains_agent
        python_binary: ""
    
        # select python package manager:
        # currently supported pip and conda
        # poetry is used if pip selected and repository contains poetry.lock file
        package_manager: {
            # supported options: pip, conda, poetry
            type: pip,
    
            # specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
            pip_version: "<20.2",
    
            # virtual environment inheres packages from system
            system_site_packages: false,
    
            # install with --upgrade
            force_upgrade: false,
    
            # additional artifact repositories to use when installing python packages
            # extra_index_url: ["https://allegroai.jfrog.io/trainsai/api/pypi/public/simple"]
    
            # additional conda channels to use when installing with conda package manager
            conda_channels: ["defaults", "conda-forge", "pytorch", ]
    
            # set to True to support torch nightly build installation,
            # notice: torch nightly builds are ephemeral and are deleted from time to time
            torch_nightly: false,
        },
    
        # target folder for virtual environments builds, created when executing experiment
        venvs_dir = ~/.trains/venvs-builds
    
        # cached git clone folder
        vcs_cache: {
            enabled: true,
            path: ~/.trains/vcs-cache
        },
    
        # use venv-update in order to accelerate python virtual environment building
        # Still in beta, turned off by default
        venv_update: {
            enabled: false,
        },
    
        # cached folder for specific python package download (used for pytorch package caching)
        pip_download_cache {
            enabled: true,
            path: ~/.trains/pip-download-cache
        },
    
        translate_ssh: true,
        # reload configuration file every daemon execution
        reload_config: false,
    
        # pip cache folder mapped into docker, used for python package caching
        docker_pip_cache = ~/.trains/pip-cache
        # apt cache folder mapped into docker, used for ubuntu package caching
        docker_apt_cache = ~/.trains/apt-cache
    
        # optional arguments to pass to docker image
        # these are local for this agent and will not be updated in the experiment's docker_cmd section
        # extra_docker_arguments: ["--ipc=host", ]
    
        # optional shell script to run in docker when started before the experiment is started
        # extra_docker_shell_script: ["apt-get install -y bindfs", ]
    
        # set to true in order to force "docker pull" before running an experiment using a docker image.
        # This makes sure the docker image is updated.
        docker_force_pull: false
    
        default_docker: {
            # default docker image to use when running in docker mode
            image: "nvidia/cuda:10.1-runtime-ubuntu18.04"
    
            # optional arguments to pass to docker image
            # arguments: ["--ipc=host", ]
        }
    
        # set the initial bash script to execute at the startup of any docker.
        # all lines will be executed regardless of their exit code.
        # {python_single_digit} is translated to 'python3' or 'python2' according to requested python version
        # docker_init_bash_script = [
        #     "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean",
        #     "chown -R root /root/.cache/pip",
        #     "apt-get update",
        #     "apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0",
        #     "(which {python_single_digit} && {python_single_digit} -m pip --version) || apt-get install -y {python_single_digit}-pip",
        # ]
    
        # cuda versions used for solving pytorch wheel packages
        # should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
        # cuda_version: 10.1
        # cudnn_version: 7.6
    }
    
  4. In the sdk.storage.cache section, add the size.

    size {
        # max_used_bytes = -1
        min_free_bytes = 10GB
        # cleanup_margin_percent = 5%
    }
    

    For example:

    sdk {
        # TRAINS - default SDK configuration
    
        storage {
            cache {
                # Defaults to system temp folder / cache
                default_base_dir: "~/.trains/cache"
                size {
                    # max_used_bytes = -1
                    min_free_bytes = 10GB
                    # cleanup_margin_percent = 5%
                }
            }
    
  5. Save your configuration.

Next Steps