Workers and Queues

Trains allows you to monitor experiments' resources utilization so that you can optimize experiments.

Monitored resources include:

  • CPU and GPU
  • Memory
  • Video memory
  • Network usage

Trains provides queues management features, including:

  • Monitoring queue utilization
  • Reordering queues
  • Moving experiments between queues
  • Removing experiments from queues
  • Creating, renaming, and deleting empty queues

On the Workers & Queues page, also monitor worker utilization and queue utilization, including workers in use, total workers, queue wait time, and average experiments per queue. View these metrics in chart for a period ranging from the last three hours to the last year.

Resources utilization

To monitor resource utilization:

  1. In the WORKERS tab, click a worker. The chart refreshes showing resource utilization for that worker. The worker details pane slides open, and the INFO tab appears showing information about the worker, its name, current experiment, current runtime, last iteration, and last update time.
  2. Select a metric and time frame.
    1. In the list of resources (top left side), select CPU and GPU Usage, Memory Usage, Video Memory Usage, or Network Usage.
    2. In the period list (top right side), select 3 Hours, 6 Hours, 12 Hours, 1 Day, 1 Week, or 1 Month.

Queue management

In the Queues tab, do any of the following:

  • Create a queue - Click + NEW QUEUE > Type a queue name > CREATE.
  • Do either of the following by clicking a queue in the queues list (lower right):
    • Rename a queue - Click RENAME > Type a queue name > RENAME, or click DELETE.
    • Delete a queue - Click Delete.
  • Do any of the following by right clicking a queue in the queues list (lower right):
    • Reorder experiments in a queue - Drag an experiment to a new position in the queue, or click (menu) and then select Move to top or Move to bottom.
    • Move to a new queue - Click     (menu) > Move to queue... > Select a queue > ENQUEUE.
    • Remove an experiment - Click     (menu) > Move to queue... > Select a queue > ENQUEUE.

Worker utilization

Optimize the use of your workers by monitoring worker utilization in the Workers tab.

To monitor worker utilization:

  • Open the Workers & Queues page, or switch from the Queues tab to Workers tab. The worker utilization chart appears.

    Hover over any data point and see average workers and total workers.

Queue utilization

To monitor all queues:

  • Open the Workers & Queues page, or switch from the Workers tab to Queues tab. The queue utilization chart appears. The chart shows average wait time (seconds) and number of experiments queued for all queues.
  • Hover over any data point and see average wait time and number of experiments.

To monitor a queue:

  1. View and manage a queue. In the queues list (lower right), click a queue. The chart refreshes showing metrics for the selected queue. The details pane slides open, and the EXPERIMENTS tab appears showing the enqueued experiments. You can [manage the queue]() from the details pane EXPERIMENTS tab.
  2. View information about the workers listening to the queue. In the details pane, click WORKERS.