Configuring Trains Server
Trains is now ClearML
This documentation applies to the legacy Trains versions. For the latest documentation, see ClearML.
We recommend using the latest version of Trains Server.
Trains Server Configurations
Trains Server supports two configurations: single IP (domain) and sub-domains. To configure Trains Server for sub-domains, see Sub-domains and load balancers.
Single IP (domain) configuration
Single IP (domain) with the following open ports:
- Web application on port
8080
- API service on port
8008
- File storage service on port
8081
Sub-domain configuration
Sub-Domain configuration with default http/s ports (80
or 443
):
- Web application on sub-domain:
app.*.*
- API service on sub-domain:
api.*.*
- File storage service on sub-domain:
files.*.*
When you configure sub-domains for Trains Server, they will map to the Trains Server internally configured ports for our Dockers. As a result, Trains Server Dockers remain accessible if, for example, you implement some type of port forwarding.
You must use app
, api
, and files
as the sub-domain labels.
For example, if your domain is mydomain.com
, and you create a sub-domain named trains.mydomain.com
, use the following:
app.trains.mydomain.com
(web server)api.trains.mydomain.com
(API server)files.trains.mydomain.com
(file server)
Accessing the Trains Web (UI) with app.trains.mydomain.com
will automatically send API requests to api.trains.mydomain.com
.
Configuration files
The Trains Server configuration uses the following configuration files:
- hosts.conf
- logging.conf
- secure.conf
- events.conf
- tasks.conf
- apiserver.conf
The default configuration files are in the trains-server repository.
Configuration procedures
Sub-domains and load balancers
To illustrate this configuration, we provide the following example based on AWS load balancing:
-
In your Trains Server
/opt/trains/config/apiserver.conf
file, add the followingauth.cookies
section:auth { cookies { httponly: true secure: true domain: ".trains.mydomain.com" max_age: 99999999999 } }
-
Use the following load balancer configuration:
-
Listeners:
- Optional: HTTP listener, that redirects all traffic to HTTPS.
- HTTPS listener for
app.
forwarded toAppTargetGroup
- HTTPS listener for
api.
forwarded toApiTargetGroup
- HTTPS listener for
files.
forwarded toFilesTargetGroup
-
Target groups:
AppTargetGroup
: HTTP based target group, port8080
ApiTargetGroup
: HTTP based target group, port8008
FilesTargetGroup
: HTTP based target group, port8081
-
Security and routing:
- Load balancer: make sure the load balancers are able to receive traffic from the relevant IP addresses (Security groups and Subnets definitions).
- Instances: make sure the load balancers are able to access the instances, using the relevant ports (Security groups definitions).
-
-
Restart Trains Server.
Network and security
To ensure you properly secure your deployment follow these best practices:
- If your deployment is in an open network that allows public access, only allow access to the specific ports used by Trains Server (see Trains Server configurations). If HTTPS access is configured for your instance, allow access to port
443
. - Configure Trains Server to use fixed user names and passwords (see Web Login Authentication).
For improved security, the ports for Trains Server Elasticsearch, MongoDB, and Redis servers are not exposed by default; they are only open internally in the docker network. If you need external access and understand the security risks, you can open these ports.
Opening the ports for Elasticsearch, MongoDB, and Redis for external access may pose a security concern and is not recommended unless you know what you're doing. Network security measures, such as firewall configuration, should be considered when opening ports for external access.
To open external access to the Elasticsearch, MongoDB, and Redis ports:
-
Shutdown Trains Server. Executing the following command (which assumes the configuration file is in the environment path).
docker-compose down
-
Edit your
docker-compose.yml
file as follows:-
In the
elasticsearch
section, add the two lines:ports: - "9200:9200"
-
In the
mongo
section, add the two lines:ports: - "27017:27017"
-
In the
redis
section, add the two lines:ports: - "6379:6379"
-
-
Startup Trains Server.
docker-compose -f docker-compose.yml pull docker-compose -f docker-compose.yml up -d
Web Login Authentication
You can configure the Trains Server for web login authentication which permits only those users who are provided with credentials to access your Trains system. Those credentials are a user name and password.
Without web login authentication, Trains Server does not restrict access (by default).
To add web login authentication to your Trains Server:
-
In your Trains Server
/opt/trains/config/apiserver.conf
, add theauth.fixed_users
section and specify the users.For example:
auth { # Fixed users login credentials # No other user will be able to login fixed_users { enabled: true users: [ { username: "jane" password: "12345678" name: "Jane Doe" }, { username: "john" password: "12345678" name: "John Doe" }, ] } }
-
Restart Trains Server.
Non-responsive Task watchdog
The non-responsive experiment watchdog monitors experiments that were not updated for a specified time interval and then the watchdog marks them as aborted
. The non-responsive experiment watchdog is always active.
You can modify the following settings for the watchdog:
- The time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours)).
- The time interval (in seconds) between watchdog cycles.
To configure the non-responsive watchdog for your Trains Server:
-
In your Trains Server
/opt/trains/config/services.conf
file, add or edit thetasks.non_responsive_tasks_watchdog
and specify the watchdog settings.For example:
tasks { non_responsive_tasks_watchdog { # In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog threshold_sec: 7200 # Watchdog will sleep for this number of seconds after each cycle watch_interval_sec: 900 } }
-
Restart Trains Server.