Monitoring Service Posting Slack Alerts

Trains is now ClearML

This documentation applies to the legacy Trains versions. For the latest documentation, see ClearML.

The slack alerts example runs as a Trains service, monitors the completion and failure of Tasks, and posts alert messages on the Slack channel you specify. Configure it with Slack details, which you get creating a Slack bot, and the parameters you set for monitoring. Its Task name is Slack Alerts, and it is associated with the project Monitoring.

Slack Alerts executes in the Trains Agent services container. Slack Alerts is configurable. It is pre-loaded in Trains Server and its status is Draft (editable). You can set the parameter values in the Trains Web (UI), and then enqueue to the services queue. Or, run the script, with options to run locally or enqueue the Task to the services queue.


Creating a Slack Bot

Before configuring and running the Slack alert service, create a new Slack Bot (Allegro Trains Bot).

The Slack API token and channel you create are required to configure the Slack alert service.

  1. Login to your Slack account.
  2. Go to
  3. In App Name, enter your app name; for example, "Allegro Trains Bot".
  4. In Development Slack Workspace, select your workspace.
  5. Click Create App.
  6. In Basic Information, under Display Information, complete the following:
    • In Short description, enter "Allegro Train Bot".
    • In Background color, enter "#202432".
  7. Click Save Changes.
  8. In OAuth & Permissions, under Scopes, click Add an OAuth Scope, and then select the following permissions on the list:
    • channels:join
    • channels:read
    • chat:write
  9. In OAuth Tokens & Redirect URLs:
    1. Click Install App to Workspace
    2. In the confirmation dialog, click Allow.
    3. Click Copy to copy the Bot User OAuth Access Token.

Running the service

Running using the Trains Web (UI)

Step 1. Configuring the service

  1. In the Trains Web (UI) Projects page, click the Monitoring project > click the Slack Alerts Task.
  2. In the info panel, click the CONFIGURATION tab.
  3. In the GENERAL section, hover over the parameter area > EDIT.
  4. Configure the service parameters:

    • channel - The name of your Slack channel. (MANDATORY)
    • include_completed_experiments - (bool) Include completed experiments?

      • True - Include
      • False - Do not include (default)
    • include_manual_experiments - Include experiments that are running locally?

      • True - Monitor local experiments, and remote experiments executed by Trains Agent. (default)
      • False - Remote experiments, only.
    • local - Run the monitor locally, instead of as a service. The default is False.

    • message_prefix - A message prefix. For example, to alert all channel members use: "Hey <!here>,"
    • min_num_iterations - The minimum number of iterations of failed/completed experiment to alert. The default is 0, indicating all alerts.
    • project - The name (or partial name) of the project to monitor, use empty for all projects.
    • refresh_rate - How often to run the monitoring service (seconds). The default value is 10.0.
    • service_queue - The queue that trains-agent is listening to for Tasks to execute as a service. The default is services.
    • slack_api - The Slack API key. The default value can be set in the environment variable, SLACK_API_TOKEN. (MANDATORY)

Step 2. Enqueuing the service

  • Right click the Monitoring Task > Enqueue > Select services > ENQUEUE.

Running using the script

The allows you to configure the monitoring service, and then either:

  • Run locally
  • Run in Trains Agent services mode

To run the monitoring service locally:

python --channel <Slack-channel-name> --slack-api <Slack-API-token> --local True [...]


  • channel - The Slack channel where alerts will be posted.
  • slack_api - Slack API key.
  • local - Run the monitoring service locally, only. If True, then run locally. If False, then run locally and enqueue the Task to run in Trains Agent services mode. supports additional command line options.

View the additional command line options
  • message_prefix - Message prefix. The default value is an empty string.
  • min_num_iterations - Minimum number of iterations of failed/completed experiment to alert. Use this option to eliminate debug sessions that fail quickly. The default value is 0 (alerts for experiments).
  • include_manual_experiments - Include experiments running manually (i.e. not by trains-agent) The default value is False
  • include_completed_experiments - Include completed experiments. If False, then include send alerts for failed Tasks, only. If True, then send alert for completed and failed Tasks. The default value is False.
  • refresh_rate - How often to check the experiments, in seconds. The default value is 10 (seconds).
  • service_queue - The queue to use when running as a service. The default value is services.
  • local - Run service locally instead of as a service. If False, then automatically enqueue the Task to run in Trains Agent services mode. If False, then run locally only. The default value is False.

Additional information about

In, the class SlackMonitor inherits from the Monitor class in trains.automation.monitor. SlackMonitor overrides the following Monitor class methods:

  • get_query_parameters - Get the query parameters for Task monitoring.
  • process_task - Get the information for a Task, post a Slack message, and output to console.

The example provides the option to run locally or execute remotely, by calling the Task.execute_remotely method.

To interface to Slack, the example uses slack.WebClient and slack.errors.SlackApiError.