Version 0.16

Important

Trains is now ClearML.

Version 0.16.4

Trains

Features

  • Add Hydra support (GitHub trains Issue 219)

  • Add cifar ignite example (GitHub trains Issue 237)

  • Add auto extraction of tar.gz files when using StorageManager (GitHub trains Issue 237)

  • Add Task.init() argument auto_connect_streams controlling stdout/stderr/logging capture (GitHub trains Issue 181)

  • Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181)

  • Add Task.create_function_task() to allow creating a new task using a function and arguments to be executed remotely (GitHub trains Issue 230)

  • Allow disabling SSL certificates verification using Task.setup_upload() argument verify or AWS S3 bucket configuration verify property (GitHub trains Issue 256)

  • Add StorageManager.get_files_server()

  • Add Task.get_project_id() using project name

  • Add project_name argument to Task.set_project()

  • Add Task.connect() support for class / instance objects

  • Add Task get_configuration_object() and Task.set_configuration_object() for easier automation

  • Improve Auto-Scaler - allow extra configurations, key name and security group are now optional, defaults using empty strings

  • Use a built-in matplotlib convertor

  • Add reporting text as debug sample example

Bug Fixes

  • Fix Optuna HPO parameter serializing (GitHub trains Issue 254)

  • Fix connect dictionary '' cast to None (GitHub trains Issue 258)

  • Fix lightgbm binding keyword argument issue (GitHub trains Issue 251)

  • Fix artifact preview if artifact body is remote URI (GitHub trains Issue 239)

  • Fix infinite recursion in StorageManager upload (GitHub trains Issue 253)

  • Fix keras reusing model object only if the filename is the same (GitHub trains Issue 252)

  • Fix running remotely with no configuration should not crash but output a warning (GitHub trains Issue 243)

  • matplotlib

    • Fix matplotlib 3.3.3 support

    • Fix global figure enumeration

    • Fix binding without a title reported a single plot (untitled 00) instead of increasing the counter

  • Fix Python 2.7/3.5 support

  • Fix quote issue when reporting debug images

  • Fix replace quote safe characters in upload file to include ;=@$

  • Fix at_exit called from another process should be ignored

  • Fix Task.set_tags() for completed / published tasks

  • Fix Task.add_tags() not working when running remotely

  • Fix Task.set_user_properties() docstring and interface

  • Fix preview with JSON (dict) artifacts did not store the artifact

  • Fix Logger.report_text() on task created using Task.create() was not supported

  • Fix initialization for torch: only call torch get_worker_info if torch was loaded

  • Fix flush (wait) on auxiliary task (obtained using Task.get_task()) should wait on all upload events

  • Fix server was not updated with the defaults from the code when running remotely and configuration section is missing

  • Fix connect dict containing None default values, blocked the remote execution from passing string instead of None

  • Fix Task.upload_artifact() argument delete_after_upload=True used in conjunction with wait_for_upload=True was not supported

Version 0.16.3

Trains

Features

  • Add LightGBM support

  • Add initial Hydra support (GitHub trains Issue 219)

  • Add synchronous support for Task.upload_artifact() (GitHub trains Issue 231)

  • Add sdk.development.store_code_diff_from_remote (default false) to store diff from remote HEAD instead of local HEAD (GitHub trains Issue 222)

  • Add sdk.development.detect_with_conda_freeze (default true) for full conda freeze (requires trains-agent >= 16.2)

  • Add user properties support in Task object

  • Add Logger.report_table() support for table as list of lists

  • Add support to split DAG and Table in pipeline DAG plot. Pipeline DAG single nodes are now round circles below the DAG graph.

  • Add Pipeline/Optimization can be attached to any Task (not just the current task)

  • Add force_download flag to StorageManager.get_local_copy()

  • Add control over the artifact preview using Task.upload_artifact() preview argument

  • Add Logger.report_matplotlib_figure() with examples

  • Add Task.set_task_type()

  • AWS auto-scaler

    • Add key pair and security groups support

    • Add multi-line support for both extra bash script and extra trains.conf data

  • Update examples

Bug Fixes

  • Fix Task.update_output_model() wrong argument order (GitHub trains Issue 220)

  • Fix initializing task on argparse parse in remote mode. Do not call Task.init() to avoid auto connect, use Task.get_task() instead

  • Fix detected task cwd outside of repository root folder

  • Fix Task.connect(dict) to place non-existing entries on the section name instead of General

  • Fix Task.clone() support for trains-server < 0.16

  • Fix StorageManager cache extract zipped artifacts. Use modified time instead of access time for cached files

  • Fix diff command output was stripped

  • Make sure local packages with multi-files are marked as package

  • Fix Task.set_base_docker() should be skipped when running remotely

  • Fix ArgParser binding handling of string argument with boolean default value (affects Pytorch Lightning integration)

  • When using detect_with_pip_freeze make sure that package @ file:// lines are replaced with package==x.y.z as local file will probably not be available

  • Fix git packages to new pip standard package @ git+

  • Improve conda package naming _ and - support

  • Do not add specific setuptools version to requirements (pip can’t install it anyway)

  • Fix image URL quoting when uploading from a file path

Version 0.16.2

Trains

Features

  • Add Task.set_resource_monitor_iteration_timeout() to set ResourceMonitor iteration wait duration timeout (GitHub trains Issue 208).

  • Add PyTorch Lightning save/restore model binding (GitHub trains Issue 212).

  • Add git diff for repository submodule (requires git 2.14 or above).

  • Add TrainsJob.is_completed() and TrainsJob.is_aborted().

  • Add Task.logger property.

  • Add Pipeline Controller automation and example (see here).

  • Add improved trace filtering capabilities in trains.debugging.trace.trace_trains().

  • Add default help per argument (if not provided) in ArgParser binding.

  • Deprecate Task.reporter.

  • Update PyTorch example.

  • Remove warning on skipped auto-magic model logging (GitHub trains Issue 206).

  • Support Keras restructuring for Network, Model and Sequential.

  • Update autokeras requirements according to https://github.com/keras-team/autokeras#installation.

Bug Fixes

  • Fix joblib auto logging models failing on compressed streams (GitHub trains Issue 203).

  • Fix sending empty reports (GitHub trains Issue 205).

  • Fix scatter2d sub-sampling and rounding.

  • Fix plots reporting

    • NaN representation (matplotlib conversion).

    • Limit the number of digits in a plot to reduce plot size (using sdk.metrics.plot_max_num_digits configuration value).

  • Fix Task.wait_for_status() to reload after it ends.

  • Fix thread wait Ctrl-C interrupt did not exit process.

  • Improve Windows support for installed packages analysis.

  • Fix auto model logging using relative path.

  • Fix Hyper-parameter Optimization example.

  • Fix Task.clone() when working with TrainsServer < 0.16.0.

  • Fix pandas artifact handling.

  • Avoid adding unnamed:0 column.

  • Return original pandas object.

  • Fix TrainsJob hyper-params overriding order was not guaranteed.

  • Fix ArgParse auto-connect to support default function type.

Trains-Agent

Features

  • conda

    • Add agent.package_manager.conda_env_as_base_docker allowing “docker_cmd” to contain link to a full pre-packaged conda environment (tar.gz created by conda-pack). Use TRAINS_CONDA_ENV_PACKAGE environment variable to specify conda tar.gz file.

    • Add conda support for read-only pre-built environment (pass conda folder as docker_cmd on Task)

    • Improve trying to find conda executable

  • k8s glue

    • Add support for limited number of services exposing ports

    • Add support for k8s pod custom user properties

    • Allow selecting external trains.conf file for the pod itself

    • Allow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress/ELB)

  • Allow specifying cudatoolkit version in the “installed packages” section when using conda as package manager (GitHub trains Issue 229)

  • Add agent.package_manager.force_repo_requirements_txt. If True, “Installed Packages” on Task are ignored, and only repository requirements.txt is used

  • Pass TRAINS_DOCKER_IMAGE into docker for interactive sessions

  • Add torchcsprng and torchtext to PyTorch resolving

Bug Fixes

  • When logging suppress “\r” when reading a current chunk of a file/stream. Add agent.suppress_carriage_return (default True) to support previous behavior

  • Make sure TRAINS_AGENT_K8S_HOST_MOUNT is used only once per mount

  • Fix k8s glue script to trains-agent default docker script

  • Fix apply git diff from submodule only

  • conda

    • Fix conda pip freeze to be consistent with trains 0.16.3

    • Fix conda environment support for trains 0.16.3 full env. Add agent.package_manager.conda_full_env_update to allow conda to update back the requirements (default False, to preserve previous behavior)

    • Fix running from conda environment - conda.sh not found in first conda PATH match

  • Fix docker mode ubuntu/debian support by making sure not to ask for input (fix tzdata install)

  • Fix repository detection - ignore environment SSH_AUTH_SOCK, only check if git user/pass are configured

  • git diff

    • Fix support for non-ascii diff

    • Fix diff with empty line at the end will cause corrupt diff apply message

    • Allow zero context diffs (useful when blind patching repository)

  • Fix daemon --stop when agent UID cannot be located

  • Fix nvidia docker support on some linux distros (SUSE)

  • Fix nvidia pytorch dockers support

  • Fix torch CUDA 11.1 support

  • Fix requirements dict with null entry in pip should be considered None install from repository’s requirements.txt

Version 0.16.1

Trains

Features

  • Enhance HyperParameter optimizer.

Bug Fixes

  • Fix typing dependency for Python<3.5 (GitHub trains Issue 184).

  • Fix git+https requirements handling, resolve top_level.txt package name (kerastuner from git was not detected).

  • Fix Task.get_reported_console_output() for new Trains Server API v2.9.

  • Fix cache handling for different partitions/drives/devices.

  • Disable offline mode when running remotely (i.e. executed by Trains Agent).

  • Fix artifact upload to only use file stream when not uploading a locally stored file (multipart upload is not supported on stream upload) (GitHub trains Issue 189).

  • Fix double-escaped model design text when connecting OutputModel.

Trains Server

IMPORTANT: Upgrading to this version requires a manual data migration.

Bug Fixes

  • Fix model page issue causing N/A to show after switching tabs (Trains Slack channel thread).

  • Removed experiments comparison limit (only 10 were allowed). Limit is now 100, configurable using services.tasks.multi_task_histogram_limit. (Trains Slack channel thread).

  • Fix scalar plots sometimes not calculated by the server in lower iteration values (Trains Slack channel thread).

  • Fix error while retrieving experiment log when only a few lines were reported (GitHub trains-server Issue 59).

  • Update Fixed User full-name on restart (Trains Slack channel thread).

  • Fix project ordering issue.

  • When loading plots, display a spinner and don’t show “no data”.

  • Improve logging to provide more coherent ElasticSearch connection status in server log.

Trains Agent

Features

  • Add sdk.metrics.plot_max_num_digits configuration option to reduce plot storage size.

  • Add agent.package_manager.post_packages and agent.package_manager.post_optional_packages configuration options to control packages install order (e.g. horovod).

  • Add agent.git_host configuration option for limiting git credential usage for a specific host (overridable using TRAINS_AGENT_GIT_HOST environment variable).

  • Add agent.force_git_ssh_port configuration option to control https to ssh link conversion for non standard ssh ports.

  • Add requirements detection features

    • Improve support for detecting new pip version (20+) supporting package @ scheme://link.

Bug Fixes

  • Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a git+http link is enough to make sure all requirements are met/installed (GitHub Issue #196).

  • Fix incorrect check for spaces in current execution folder.

  • Fix requirements detection

    • Update torch version after using downloaded / system pre-installed version.

    • Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version).

Version 0.16.0

Trains

Features

  • Add continuing of previously executed experiments. Add Task.init() argument continue_last_task to continue a previously used Task (GitHub Issue #160).

  • Allow Task editing/creation from code. Task.export_task/import_task/update_task() (GitHub Issue #128).

  • Add offline mode. Use Task.set_offline() and Task.import_offline_session()

    • Support setting offline mode via TRAINS_OFFLINE_MODE=1 environment variable.

    • Support setting offline API version via TRAINS_OFFLINE_MODE=2.9 environment variable.

  • Automatically pickle all objects when uploading as artifacts, task.upload_artifact() argument auto_pickle=True (GitHub Issue #153).

  • Add multiple sections/groups support for Task hyper-parameters using Task.connect().

  • Add multiple configurations (files) using Task.connect_configuration.

  • Allow enabling OS environment logging using the sdk.development.log_os_environments configuration parameter (complements the TRAINS_LOG_ENVIRONMENT environment variable).

  • Add Optuna support for hyper-parameter optimization controller. OptimizerOptuna is now the default optimizer.

  • Add initial Keras-Tuner support (GitHub Issue keras-team/keras-tuner #334).

  • Add automatic FastAI logging. It is disabled if tensorboard is loaded (assuming TensorBoradLogger will be used).

  • Support Tensorboard text logging (add_text()) as debug samples (.txt files), instead of as console output.

  • Allow for more standard confusion matrix reporting. Logger.report_confusion_matrix() argument yaxis_reversed (flips the confusion matrix if True, default False) (GitHub Issue #165).

  • Add support for Trains Server 0.16.0 (API v2.9 support).

  • Allow disabling Trains update message from the log using the TRAINS_SUPPRESS_UPDATE_MESSAGE environment variable (GitHub Issue #157).

  • Add AWS EC2 Auto-Scaler service wizard and Service.

  • Improved and updated examples

    • Add Keras Tuner CIFAR10 example.

    • Add FastAI example.

    • Update PyTorch Jupyter notebook examples (GitHub Issue #150).

  • Support global requirements detection using pip freeze (set sdk.development.detect_with_pip_freeze configuration in trains.conf).

  • Add Task.get_projects() to get all projects in the system, sorted by last update time.

Bug Fixes

  • Fix UTC to time stamp in comment (GitHub Issue #152).

  • Fix and enhance GPU monitoring

  • Fix filename too long bug (GitHub trains-server Issue #49).

  • Fix TensorFlow image logging to allow images with no width/height/color metadata (GitHub Issue #182).

  • Fix multiprocessing Pool throw exception in pool hangs execution. Call original signal handler and re-flush stdout.

  • Fix plotly support for matplotlib 3.3.

  • Add Python 2.7 support for get_current_thread_id().

  • Update examples requirements.

  • Fix and improve signal handling.

  • Fix Tensorboard 2D convolution histogram, improve histogram accuracy on very small histograms.

  • Fix auto logging multiple argparse calls before Task.init().

  • Limit experiment Git diff logging to 500Kb. If larger than 500Kb, diff section will contain a warning and entire diff will be uploaded as an artifact named auxiliary_git_dif.

  • Fix requirements detection

    • Fix Trains installed from git+.

    • Fix when Trains is not directly imported.

    • Fix multiple -e packages were not detected (only the first one).

    • Fix running with Trains in PYTHONPATH resulted in double entry of trains.

  • Fix Task.set_base_docker() on main task to do nothing when running remotely.

Trains Server

IMPORTANT: Upgrading to this version requires a manual data migration.

Features

  • Add experiment hyperparameter grouping.

    • HYPER PARAMETERS tab renamed to CONFIGURATIONS.

    • CONFIGURATIONS tab contains the sections USER PROPERTIES, HYPER PARAMETERS, CONFIGURATION OBJECTS

    • Add user properties group. Key-value pairs always editable (USER PROPERTIES section).

    • Add command line options group * argparse and older experiments parameters (CONFIGURATIONS/HYPER PARAMETERS/Args).

    • Add TensorFlow definitions group (CONFIGURATIONS/HYPER PARAMETERS/TF_DEFINE).

    • Add environment variables group (CONFIGURATIONS/HYPER PARAMETERS/Environment).

  • Improve experiment model configuration.

    • Model design is in the ARTIFACTS tab.

    • Experiment model description is in the CONFIGURATIONS OBJECTS section in the CONFIGURATIONS tab.

  • Improve experiment comparison:

    • In hyperparameter parallel coordinate comparison, hover over an experiment name to highlight it on plot (GitHub Issue #53).

    • Remove fields providing no additional information from comparison.

  • Improve the model framework filter. Filter contains only frameworks used by models in the project.

  • Add configurable Trains services examples.

  • Add support for text debug samples in the DEBUG SAMPLES section in the RESULTS tab.

  • Add legend on/off toggle control for every plot.

  • Add clear button for text areas (GitHub trains-server Issue #42).

  • Reinstate the bottom bar Archive button.

  • Add Trains community links to left bar.

  • Add Hi-DPI display support.

  • Add debug.ping endpoint for simple health monitoring.

  • Add support for field exclusion in *.get_all endpoints.

  • Move to ElasticSearch 7. Requires manual data migration

Bug Fixes

  • Auto-fit column width on column resize double click.

  • Allow top-bar search if fewer than three characters are entered, and Enter is pressed.

Trains Agent

Features

  • Add agent.docker_init_bash_script configuration section to allow finer control over docker startup script.

  • Changed default docker image from nvidia/cuda to nvidia/cuda:10.1-runtime-ubuntu18.04 to support cudnn frameworks (e.g. TF).

  • Improve support for dockers with preinstalled conda environment.

  • Improve trains-agent-docker spinning.

  • Add daemon --order-fairness for round-robin queue pulling.

  • Add daemon --stop to terminate a running agent (assuming other arguments are the same)

    • If no additional arguments, Agents are terminated in lexicographical order.

  • Support cleanup of all log files on termination unless executed with --debug.

  • Add error message when Trains API Server is not accessible on startup.

Bug Fixes

  • Fix GPU Windows monitoring support (GitHub Issue #177).

  • Fix .git-credentials and .gitconfig mapping into docker.

  • Fix non-root docker image usage.

  • Fix docker to use UTF-8 encoding, so prints won’t break it.

  • Fix --debug to set all loggers to DEBUG.

  • Fix task status change to queued should never happen during Task runtime.

  • Fix requirement_parser to support package @ git+http lines.

  • Fix GIT user/password in requirements and support for -e git+http lines.

  • Fix configuration wizard to generate trains.conf matching latest Trains definitions.