Configuring Trains Server

Trains is now ClearML

This documentation applies to the legacy Trains versions. For the latest documentation, see ClearML.

We recommend using the latest version of Trains Server.

Trains Server Configurations

Trains Server supports two configurations: single IP (domain) and sub-domains. To configure Trains Server for sub-domains, see Sub-domains and load balancers.

Single IP (domain) configuration

Single IP (domain) with the following open ports:

  • Web application on port 8080
  • API service on port 8008
  • File storage service on port 8081

Sub-domain configuration

Sub-Domain configuration with default http/s ports (80 or 443):

  • Web application on sub-domain: app.*.*
  • API service on sub-domain: api.*.*
  • File storage service on sub-domain: files.*.*

When you configure sub-domains for Trains Server, they will map to the Trains Server internally configured ports for our Dockers. As a result, Trains Server Dockers remain accessible if, for example, you implement some type of port forwarding.

You must use app, api, and files as the sub-domain labels.

For example, if your domain is, and you create a sub-domain named, use the following:

  • (web server)
  • (API server)
  • (file server)

Accessing the Trains Web (UI) with will automatically send API requests to

Configuration files

The Trains Server configuration uses the following configuration files:

  • hosts.conf
  • logging.conf
  • secure.conf
  • events.conf
  • tasks.conf
  • apiserver.conf

The default configuration files are in the trains-server repository.

Configuration procedures

Sub-domains and load balancers

To illustrate this configuration, we provide the following example based on AWS load balancing:

  1. In your Trains Server /opt/trains/config/apiserver.conf file, add the following auth.cookies section:

    auth {
      cookies {
        httponly: true
        secure: true
        domain: ""
        max_age: 99999999999
  2. Use the following load balancer configuration:

    • Listeners:

      • Optional: HTTP listener, that redirects all traffic to HTTPS.
      • HTTPS listener for app. forwarded to AppTargetGroup
      • HTTPS listener for api. forwarded to ApiTargetGroup
      • HTTPS listener for files. forwarded to FilesTargetGroup
    • Target groups:

      • AppTargetGroup: HTTP based target group, port 8080
      • ApiTargetGroup: HTTP based target group, port 8008
      • FilesTargetGroup: HTTP based target group, port 8081
    • Security and routing:

      • Load balancer: make sure the load balancers are able to receive traffic from the relevant IP addresses (Security groups and Subnets definitions).
      • Instances: make sure the load balancers are able to access the instances, using the relevant ports (Security groups definitions).
  3. Restart Trains Server.

Network and security

To ensure you properly secure your deployment follow these best practices:

  • If your deployment is in an open network that allows public access, only allow access to the specific ports used by Trains Server (see Trains Server configurations). If HTTPS access is configured for your instance, allow access to port 443.
  • Configure Trains Server to use fixed user names and passwords (see Web Login Authentication).

For improved security, the ports for Trains Server Elasticsearch, MongoDB, and Redis servers are not exposed by default; they are only open internally in the docker network. If you need external access and understand the security risks, you can open these ports.

Opening the ports for Elasticsearch, MongoDB, and Redis for external access may pose a security concern and is not recommended unless you know what you're doing. Network security measures, such as firewall configuration, should be considered when opening ports for external access.

To open external access to the Elasticsearch, MongoDB, and Redis ports:

  1. Shutdown Trains Server. Executing the following command (which assumes the configuration file is in the environment path).

    docker-compose down
  2. Edit your docker-compose.yml file as follows:

    • In the elasticsearch section, add the two lines:

      - "9200:9200"
    • In the mongo section, add the two lines:

      - "27017:27017"
    • In the redis section, add the two lines:

      - "6379:6379"
  3. Startup Trains Server.

    docker-compose -f docker-compose.yml pull
    docker-compose -f docker-compose.yml up -d

Web Login Authentication

You can configure the Trains Server for web login authentication which permits only those users who are provided with credentials to access your Trains system. Those credentials are a user name and password.

Without web login authentication, Trains Server does not restrict access (by default).

To add web login authentication to your Trains Server:

  1. In your Trains Server /opt/trains/config/apiserver.conf, add the auth.fixed_users section and specify the users.

    For example:

    auth {
        # Fixed users login credentials
        # No other user will be able to login
        fixed_users {
            enabled: true
            users: [
                    username: "jane"
                    password: "12345678"
                    name: "Jane Doe"
                    username: "john"
                    password: "12345678"
                    name: "John Doe"
  2. Restart Trains Server.

Non-responsive Task watchdog

The non-responsive experiment watchdog monitors experiments that were not updated for a specified time interval and then the watchdog marks them as aborted. The non-responsive experiment watchdog is always active.

You can modify the following settings for the watchdog:

  • The time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours)).
  • The time interval (in seconds) between watchdog cycles.

To configure the non-responsive watchdog for your Trains Server:

  1. In your Trains Server /opt/trains/config/services.conf file, add or edit the tasks.non_responsive_tasks_watchdog and specify the watchdog settings.

    For example:

    tasks {
        non_responsive_tasks_watchdog {
            # In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
            threshold_sec: 7200
            # Watchdog will sleep for this number of seconds after each cycle
            watch_interval_sec: 900
  2. Restart Trains Server.