Version 0.16
Trains is now ClearML
This documentation applies to the legacy Trains versions. For the latest documentation, see ClearML.
Version 0.16.4
Trains
Features
- Add Hydra support (GitHub trains Issue 219)
- Add cifar ignite example (GitHub trains Issue 237)
- Add auto extraction of
tar.gz
files when usingStorageManager
(GitHub trains Issue 237) - Add
Task.init()
argumentauto_connect_streams
controlling stdout/stderr/logging capture (GitHub trains Issue 181) - Add carriage return flush support using the
sdk.development.worker.console_cr_flush_period
configuration setting (GitHub trains Issue 181) - Add
Task.create_function_task()
to allow creating a new task using a function and arguments to be executed remotely (GitHub trains Issue 230) - Allow disabling SSL certificates verification using
Task.setup_upload()
argumentverify
or AWS S3 bucket configurationverify
property (GitHub trains Issue 256) - Add
StorageManager.get_files_server()
- Add
Task.get_project_id()
using project name - Add
project_name
argument toTask.set_project()
- Add
Task.connect()
support for class / instance objects - Add
Task get_configuration_object()
andTask.set_configuration_object()
for easier automation - Improve Auto-Scaler - allow extra configurations, key name and security group are now optional, defaults using empty strings
- Use a built-in matplotlib convertor
- Add reporting text as debug sample example
Bug Fixes
- Fix Optuna HPO parameter serializing (GitHub trains Issue 254)
- Fix connect dictionary
''
cast toNone
(GitHub trains Issue 258) - Fix lightgbm binding keyword argument issue (GitHub trains Issue 251)
- Fix artifact preview if artifact body is remote URI (GitHub trains Issue 239)
- Fix infinite recursion in
StorageManager
upload (GitHub trains Issue 253) - Fix keras reusing model object only if the filename is the same (GitHub trains Issue 252)
- Fix running remotely with no configuration should not crash but output a warning (GitHub trains Issue 243)
- matplotlib
- Fix matplotlib 3.3.3 support
- Fix global figure enumeration
- Fix binding without a title reported a single plot (
untitled 00
) instead of increasing the counter
- Fix Python 2.7/3.5 support
- Fix quote issue when reporting debug images
- Fix replace quote safe characters in upload file to include
;=@$
- Fix
at_exit
called from another process should be ignored - Fix
Task.set_tags()
for completed / published tasks - Fix
Task.add_tags()
not working when running remotely - Fix
Task.set_user_properties()
docstring and interface - Fix preview with JSON (dict) artifacts did not store the artifact
- Fix
Logger.report_text()
on task created usingTask.create()
was not supported - Fix initialization for torch: only call torch
get_worker_info
if torch was loaded - Fix flush (wait) on auxiliary task (obtained using
Task.get_task()
) should wait on all upload events - Fix server was not updated with the defaults from the code when running remotely and configuration section is missing
- Fix connect dict containing
None
default values, blocked the remote execution from passing string instead of None - Fix
Task.upload_artifact()
argumentdelete_after_upload=True
used in conjunction withwait_for_upload=True
was not supported
Version 0.16.3
Trains
Features
- Add LightGBM support
- Add initial Hydra support (GitHub trains Issue 219)
- Add synchronous support for
Task.upload_artifact()
(GitHub trains Issue 231) - Add
sdk.development.store_code_diff_from_remote
(defaultfalse
) to store diff from remote HEAD instead of local HEAD (GitHub trains Issue 222) - Add
sdk.development.detect_with_conda_freeze
(defaulttrue
) for full conda freeze (requires trains-agent >= 16.2) - Add user properties support in Task object
- Add
Logger.report_table()
support for table as list of lists - Add support to split DAG and Table in pipeline DAG plot. Pipeline DAG single nodes are now round circles below the DAG graph.
- Add Pipeline/Optimization can be attached to any Task (not just the current task)
- Add
force_download
flag toStorageManager.get_local_copy()
- Add control over the artifact preview using
Task.upload_artifact()
preview
argument - Add
Logger.report_matplotlib_figure()
with examples - Add
Task.set_task_type()
- AWS auto-scaler
- Add key pair and security groups support
- Add multi-line support for both extra bash script and extra
trains.conf
data - Update examples
Bug Fixes
- Fix
Task.update_output_model()
wrong argument order (GitHub trains Issue 220) - Fix initializing task on argparse parse in remote mode. Do not call
Task.init()
to avoid auto connect, useTask.get_task()
instead - Fix detected task cwd outside of repository root folder
- Fix
Task.connect(dict)
to place non-existing entries on the section name instead of General - Fix
Task.clone()
support for trains-server < 0.16 - Fix
StorageManager
cache extract zipped artifacts. Use modified time instead of access time for cached files - Fix diff command output was stripped
- Make sure local packages with multi-files are marked as
package
- Fix
Task.set_base_docker()
should be skipped when running remotely - Fix ArgParser binding handling of string argument with boolean default value (affects Pytorch Lightning integration)
- When using
detect_with_pip_freeze
make sure thatpackage @ file://
lines are replaced withpackage==x.y.z
as local file will probably not be available - Fix git packages to new pip standard
package @ git+
- Improve conda package naming
_
and-
support - Do not add specific setuptools version to requirements (pip can't install it anyway)
- Fix image URL quoting when uploading from a file path
Trains Agent
Features
- Change k8s pod naming scheme in k8s glue to include queue name, conform queue name to k8s standard
Bug Fixes
- Update PyJWT requirement (v2.0.0 breaks interface)
- Update other requirements constraints
Version 0.16.2
Trains
Features
- Add
Task.set_resource_monitor_iteration_timeout()
to set ResourceMonitor iteration wait duration timeout (GitHub trains Issue 208). - Add PyTorch Lightning save/restore model binding (GitHub trains Issue 212).
- Add
git diff
for repository submodule (requires git 2.14 or above). - Add
TrainsJob.is_completed()
andTrainsJob.is_aborted()
. - Add
Task.logger
property. - Add Pipeline Controller automation and example (see here).
- Add improved trace filtering capabilities in
trains.debugging.trace.trace_trains()
. - Add default help per argument (if not provided) in ArgParser binding.
- Deprecate
Task.reporter
. - Update PyTorch example.
- Remove warning on skipped auto-magic model logging (GitHub trains Issue 206).
- Support Keras restructuring for Network, Model and Sequential.
- Update autokeras requirements according to https://github.com/keras-team/autokeras#installation.
Bug Fixes
- Fix joblib auto logging models failing on compressed streams (GitHub trains Issue 203).
- Fix sending empty reports (GitHub trains Issue 205).
- Fix scatter2d sub-sampling and rounding.
- Fix plots reporting
NaN
representation (matplotlib conversion).- Limit the number of digits in a plot to reduce plot size (using
sdk.metrics.plot_max_num_digits
configuration value).
- Fix
Task.wait_for_status()
to reload after it ends. - Fix thread wait Ctrl-C interrupt did not exit process.
- Improve Windows support for installed packages analysis.
- Fix auto model logging using relative path.
- Fix Hyper-parameter Optimization example.
- Fix
Task.clone()
when working with TrainsServer < 0.16.0. - Fix pandas artifact handling.
- Avoid adding
unnamed:0
column. - Return original pandas object.
- Fix
TrainsJob
hyper-params overriding order was not guaranteed. - Fix ArgParse auto-connect to support default function type.
Trains-Agent
Features
-
conda
- Add
agent.package_manager.conda_env_as_base_docker
allowing "docker_cmd" to contain link to a full pre-packaged conda environment (tar.gz
created byconda-pack
). UseTRAINS_CONDA_ENV_PACKAGE
environment variable to specifyconda tar.gz
file. - Add conda support for read-only pre-built environment (pass conda folder as
docker_cmd
on Task) - Improve trying to find conda executable
- Add
-
k8s glue
- Add support for limited number of services exposing ports
- Add support for k8s pod custom user properties
- Allow selecting external
trains.conf
file for the pod itself - Allow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress/ELB)
-
Allow specifying
cudatoolkit
version in the "installed packages" section when using conda as package manager (GitHub trains Issue 229) - Add
agent.package_manager.force_repo_requirements_txt
. If True, "Installed Packages" on Task are ignored, and only repositoryrequirements.txt
is used - Pass
TRAINS_DOCKER_IMAGE
into docker for interactive sessions - Add
torchcsprng
andtorchtext
to PyTorch resolving
Bug Fixes
- When logging suppress "\r" when reading a current chunk of a file/stream. Add
agent.suppress_carriage_return
(default True) to support previous behavior - Make sure
TRAINS_AGENT_K8S_HOST_MOUNT
is used only once per mount - Fix k8s glue script to trains-agent default docker script
- Fix apply git diff from submodule only
-
conda
- Fix conda pip freeze to be consistent with trains 0.16.3
- Fix conda environment support for trains 0.16.3 full env. Add
agent.package_manager.conda_full_env_update
to allow conda to update back the requirements (default False, to preserve previous behavior) - Fix running from conda environment -
conda.sh
not found in first conda PATH match
-
Fix docker mode ubuntu/debian support by making sure not to ask for input (fix
tzdata
install) - Fix repository detection - ignore environment
SSH_AUTH_SOCK
, only check if git user/pass are configured - git diff
- Fix support for non-ascii diff
- Fix diff with empty line at the end will cause corrupt diff apply message
- Allow zero context diffs (useful when blind patching repository)
- Fix
daemon --stop
when agent UID cannot be located - Fix nvidia docker support on some linux distros (SUSE)
- Fix nvidia pytorch dockers support
- Fix torch CUDA 11.1 support
- Fix requirements dict with null entry in
pip
should be considered None install from repository'srequirements.txt
Version 0.16.1
Trains
Features
- Enhance HyperParameter optimizer.
Bug Fixes
- Fix typing dependency for Python<3.5 (GitHub trains Issue 184).
- Fix git+https requirements handling, resolve top_level.txt package name (kerastuner from git was not detected).
- Fix
Task.get_reported_console_output()
for new Trains Server API v2.9. - Fix cache handling for different partitions/drives/devices.
- Disable offline mode when running remotely (i.e. executed by Trains Agent).
- Fix artifact upload to only use file stream when not uploading a locally stored file (multipart upload is not supported on stream upload) (GitHub trains Issue 189).
- Fix double-escaped model design text when connecting OutputModel.
Trains Server
IMPORTANT: Upgrading to this version requires a manual data migration.
Bug Fixes
- Fix model page issue causing N/A to show after switching tabs (Trains Slack channel thread).
- Removed experiments comparison limit (only 10 were allowed). Limit is now 100, configurable using
services.tasks.multi_task_histogram_limit
. (Trains Slack channel thread). - Fix scalar plots sometimes not calculated by the server in lower iteration values (Trains Slack channel thread).
- Fix error while retrieving experiment log when only a few lines were reported (GitHub trains-server Issue 59).
- Update Fixed User full-name on restart (Trains Slack channel thread).
- Fix project ordering issue.
- When loading plots, display a spinner and don't show "no data".
- Improve logging to provide more coherent ElasticSearch connection status in server log.
Trains Agent
Features
- Add
sdk.metrics.plot_max_num_digits
configuration option to reduce plot storage size. - Add
agent.package_manager.post_packages
andagent.package_manager.post_optional_packages
configuration options to control packages install order (e.g. horovod). - Add
agent.git_host
configuration option for limiting git credential usage for a specific host (overridable usingTRAINS_AGENT_GIT_HOST
environment variable). - Add
agent.force_git_ssh_port
configuration option to control https to ssh link conversion for non standard ssh ports. - Add requirements detection features
- Improve support for detecting new pip version (20+) supporting
package @ scheme://link
.
- Improve support for detecting new pip version (20+) supporting
Bug Fixes
- Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a
git+http
link is enough to make sure all requirements are met/installed (GitHub Issue #196). - Fix incorrect check for spaces in current execution folder.
- Fix requirements detection
- Update torch version after using downloaded / system pre-installed version.
- Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version).
Version 0.16.0
Trains
Features
- Add continuing of previously executed experiments. Add
Task.init()
argumentcontinue_last_task
to continue a previously used Task (GitHub Issue #160). - Allow Task editing/creation from code.
Task.export_task/import_task/update_task()
(GitHub Issue #128). - Add offline mode. Use
Task.set_offline()
andTask.import_offline_session()
- Support setting offline mode via
TRAINS_OFFLINE_MODE=1
environment variable. - Support setting offline API version via
TRAINS_OFFLINE_MODE=2.9
environment variable. - Automatically pickle all objects when uploading as artifacts,
task.upload_artifact()
argumentauto_pickle=True
(GitHub Issue #153). - Add multiple sections/groups support for Task hyper-parameters using
Task.connect()
. - Add multiple configurations (files) using
Task.connect_configuration
. - Allow enabling OS environment logging using the
sdk.development.log_os_environments
configuration parameter (complements theTRAINS_LOG_ENVIRONMENT
environment variable). - Add Optuna support for hyper-parameter optimization controller.
OptimizerOptuna
is now the default optimizer. - Add initial Keras-Tuner support (GitHub Issue keras-team/keras-tuner #334).
- Add automatic FastAI logging. It is disabled if tensorboard is loaded (assuming TensorBoradLogger will be used).
- Support Tensorboard text logging (
add_text()
) as debug samples (.txt
files), instead of as console output. - Allow for more standard confusion matrix reporting.
Logger.report_confusion_matrix()
argumentyaxis_reversed
(flips the confusion matrix ifTrue
, defaultFalse
) (GitHub Issue #165). - Add support for Trains Server 0.16.0 (API v2.9 support).
- Allow disabling Trains update message from the log using the
TRAINS_SUPPRESS_UPDATE_MESSAGE
environment variable (GitHub Issue #157). - Add AWS EC2 Auto-Scaler service wizard and Service.
- Improved and updated examples
- Add Keras Tuner CIFAR10 example.
- Add FastAI example.
- Update PyTorch Jupyter notebook examples (GitHub Issue #150).
- Support global requirements detection using
pip freeze
(setsdk.development.detect_with_pip_freeze
configuration intrains.conf
). - Add
Task.get_projects()
to get all projects in the system, sorted by last update time.
Bug Fixes
- Fix UTC to time stamp in comment (GitHub Issue #152).
- Fix and enhance GPU monitoring
- Fix GPU stats on Windows machines (GitHub Issue #177).
- More robust GPU monitoring (GitHub Issue #170).
- Fix filename too long bug (GitHub trains-server Issue #49).
- Fix TensorFlow image logging to allow images with no width/height/color metadata (GitHub Issue #182).
- Fix multiprocessing Pool throw exception in pool hangs execution. Call original signal handler and re-flush
stdout
. - Fix
plotly
support formatplotlib
3.3. - Add Python 2.7 support for
get_current_thread_id()
. - Update examples requirements.
- Fix and improve signal handling.
- Fix Tensorboard 2D convolution histogram, improve histogram accuracy on very small histograms.
- Fix auto logging multiple argparse calls before
Task.init()
. - Limit experiment Git diff logging to 500Kb. If larger than 500Kb, diff section will contain a warning and entire diff will be uploaded as an artifact named
auxiliary_git_dif
. - Fix requirements detection
- Fix Trains installed from
git+
. - Fix when Trains is not directly imported.
- Fix multiple
-e
packages were not detected (only the first one). - Fix running with Trains in
PYTHONPATH
resulted in double entry of trains. - Fix
Task.set_base_docker()
on main task to do nothing when running remotely.
Trains Server
IMPORTANT: Upgrading to this version requires a manual data migration.
Features
- Add experiment hyperparameter grouping.
- HYPER PARAMETERS tab renamed to CONFIGURATIONS.
- CONFIGURATIONS tab contains the sections USER PROPERTIES, HYPER PARAMETERS, CONFIGURATION OBJECTS
- Add user properties group. Key-value pairs always editable (USER PROPERTIES section).
- Add command line options group * argparse and older experiments parameters (CONFIGURATIONS/HYPER PARAMETERS/Args).
- Add TensorFlow definitions group (CONFIGURATIONS/HYPER PARAMETERS/TF_DEFINE).
- Add environment variables group (CONFIGURATIONS/HYPER PARAMETERS/Environment).
- Improve experiment model configuration.
- Model design is in the ARTIFACTS tab.
- Experiment model description is in the CONFIGURATIONS OBJECTS section in the CONFIGURATIONS tab.
- Improve experiment comparison:
- In hyperparameter parallel coordinate comparison, hover over an experiment name to highlight it on plot (GitHub Issue #53).
- Remove fields providing no additional information from comparison.
- Improve the model framework filter. Filter contains only frameworks used by models in the project.
- Add configurable Trains services examples.
- Add support for text debug samples in the DEBUG SAMPLES section in the RESULTS tab.
- Add legend on/off toggle control for every plot.
- Add clear button for text areas (GitHub trains-server Issue #42).
- Reinstate the bottom bar Archive button.
- Add Trains community links to left bar.
- Add Hi-DPI display support.
- Add
debug.ping
endpoint for simple health monitoring. - Add support for field exclusion in
*.get_all
endpoints. - Move to ElasticSearch 7. Requires manual data migration
Bug Fixes
- Auto-fit column width on column resize double click.
- Allow top-bar search if fewer than three characters are entered, and
Enter
is pressed.
Trains Agent
Features
- Add
agent.docker_init_bash_script
configuration section to allow finer control over docker startup script. - Changed default docker image from
nvidia/cuda
tonvidia/cuda:10.1-runtime-ubuntu18.04
to supportcudnn
frameworks (e.g. TF). - Improve support for dockers with preinstalled
conda
environment. - Improve trains-agent-docker spinning.
- Add
daemon --order-fairness
for round-robin queue pulling. -
Add
daemon --stop
to terminate a running agent (assuming other arguments are the same)- If no additional arguments, Agents are terminated in lexicographical order.
-
Support cleanup of all log files on termination unless executed with
--debug
. - Add error message when Trains API Server is not accessible on startup.
Bug Fixes
- Fix GPU Windows monitoring support (GitHub Issue #177).
- Fix
.git-credentials
and.gitconfig
mapping into docker. - Fix non-root docker image usage.
- Fix docker to use
UTF-8
encoding, so prints won't break it. - Fix
--debug
to set all loggers toDEBUG
. - Fix task status change to
queued
should never happen during Task runtime. - Fix
requirement_parser
to supportpackage @ git+http
lines. - Fix GIT user/password in requirements and support for
-e git+http
lines. - Fix configuration wizard to generate
trains.conf
matching latest Trains definitions.