Trains Agent Use Case Examples

This page provides use case examples for Trains Agent, including:

Running workers

The default queue

  • Run a train-agent daemon listening to the default queue to fetch Tasks and execute them.
    trains-agent daemon --queue default

User-created queues

  • Run a train-agent daemon listening to user-created queued to fetch Tasks and execute them.
    trains-agent daemon --queue <your_queue>

Prioritizing queues

  • Prioritize queues by specifying more than one queue after the queue option. The order of queues in the command determines the priorities of the queues (first queue is the highest priority).
    trains-agent daemon --queue important_jobs default
    The train-agent daemon will first try to pull jobs from the important_jobs queue, only then it will fetch a job from the default queue.

Docker mode

  • Run a train-agent daemon in a Docker container:
    trains-agent daemon --queue default --docker
  • Use your current train-agent version in the Docker container to execute your Task, instead of the latest train-agent version which is automatically installed.
    trains-agent daemon --queue default --docker --force-current-version
  • Run a train-agent daemon in a Docker container ignoring the gpus flag (also run a train-agent daemon in Docker version earlier than version 19.03).

    Set the environment variable TRAINS_DOCKER_SKIP_GPUS_FLAG = true.

  • For Kubernetes, specify a host mount on your daemon host. Do not use the host mount inside the Docker container.

    Set the environment variable TRAINS_AGENT_K8S_HOST_MOUNT.

    For example:

    TRAINS_AGENT_K8S_HOST_MOUNT=/mnt/host/data:/root/.trains
  • For debug and experimentation, start a train-agent daemon in foreground mode, where all the output is printed to screen.
    trains-agent daemon --queue default --docker --foreground
  • Run two train-agent daemons, one per GPU on the same machine, with default Nvidia/CUDA Docker.
    trains-agent daemon --gpus 0 --queue default --docker nvidia/cuda &
    trains-agent daemon --gpus 1 --queue default --docker nvidia/cuda &
  • Run two train-agent daemons, pulling from a dedicated queue dual_gpu, two GPUs per train-agent daemon, with the default Nvidia/CUDA Docker.
    trains-agent daemon --gpus 0,1 --queue dual_gpu --docker nvidia/cuda &
    trains-agent daemon --gpus 2,3 --queue dual_gpu --docker nvidia/cuda &

Specifying GPUs

  • Run a train-agent daemon with more than one GPU on the same machine.
    trains-agent daemon --gpus 0 --queue default &
    trains-agent daemon --gpus 1 --queue default &
  • Run a train-agent daemon listening with more than one GPU and a dedicated queue.
    trains-agent daemon --gpus 0,1 --queue dual_gpu &
    trains-agent daemon --gpus 2,3 --queue dual_gpu &

Debugging

  • Run a train-agent daemon in foreground mode sending all output to the console.
    trains-agent daemon --queue default --foreground

Explicit Task execution

Execute a Task without queue

  • Execute a Task in a train-agent worker without a queue.
    trains-agent execute --id <task-id>

Clone a Task and execute the cloned Task

  • Clone the specified Task and execute the cloned Task in train-agent worker without a queue.
    trains-agent execute --id <task-id> --clone

Docker mode

  • Execute a Task in train-agent worker using a Docker container without a queue.
    trains-agent execute --id <task-id> --docker

Building Docker containers

Containerized Tasks

Build a Docker container to execute a specific experiment, or clone (copy) of that experiments, at launch.

  • Build a Docker container that at launch will execute a Task specified by Task Id.
    trains-agent build --id <task-id> --docker --target <new-docker-name> --entry-point reuse_task
  • Build a Docker container that at launch will clone a Task specified by Task Id, and execute the newly cloned Task.
    trains-agent build --id <task-id> --docker --target <new-docker-name> --entry-point clone_task

After building either Docker container, to run the Docker:

  • docker run <new-docker-name>

Base Docker image

Build a Docker container using the environment of an existing Task.

  1. Build a Docker container according to the execution environment of the Task specified by Task Id.
    trains-agent build --id <task-id> --docker --target <new-docker-name>
  2. Add the Docker container as the Trains base Docker image the Task (experiment) using one of the following methods:

Launching Trains Agent in services mode

Launch Trains Agent in services mode. Listen at the services queue for enqueued Tasks to execute. If the services queue does not exist, create it. Disable GPU access for the dockers that will spin up for each Task's execution. For more information about how services work on Trains Server, see Trains Agent services container in the "Trains Server" section of the "Concepts and Architecture" page.

  • trains-agent daemon --services-mode --detached --queue services --create-queue --docker <docker_name> --cpu-only
  • For example:
    trains-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only

Training and inference Tasks

Do not enqueue training or inference Tasks into the services queue. They will put an unnecessary load on the server.