Version 0.16¶
Important
Trains is now ClearML.
Version 0.16.4¶
Trains¶
Features
Add Hydra support (GitHub trains Issue 219)
Add cifar ignite example (GitHub trains Issue 237)
Add auto extraction of
tar.gz
files when usingStorageManager
(GitHub trains Issue 237)Add
Task.init()
argumentauto_connect_streams
controlling stdout/stderr/logging capture (GitHub trains Issue 181)Add carriage return flush support using the
sdk.development.worker.console_cr_flush_period
configuration setting (GitHub trains Issue 181)Add
Task.create_function_task()
to allow creating a new task using a function and arguments to be executed remotely (GitHub trains Issue 230)Allow disabling SSL certificates verification using
Task.setup_upload()
argumentverify
or AWS S3 bucket configurationverify
property (GitHub trains Issue 256)Add
StorageManager.get_files_server()
Add
Task.get_project_id()
using project nameAdd
project_name
argument toTask.set_project()
Add
Task.connect()
support for class / instance objectsAdd
Task get_configuration_object()
andTask.set_configuration_object()
for easier automationImprove Auto-Scaler - allow extra configurations, key name and security group are now optional, defaults using empty strings
Use a built-in matplotlib convertor
Add reporting text as debug sample example
Bug Fixes
Fix Optuna HPO parameter serializing (GitHub trains Issue 254)
Fix connect dictionary
''
cast toNone
(GitHub trains Issue 258)Fix lightgbm binding keyword argument issue (GitHub trains Issue 251)
Fix artifact preview if artifact body is remote URI (GitHub trains Issue 239)
Fix infinite recursion in
StorageManager
upload (GitHub trains Issue 253)Fix keras reusing model object only if the filename is the same (GitHub trains Issue 252)
Fix running remotely with no configuration should not crash but output a warning (GitHub trains Issue 243)
matplotlib
Fix matplotlib 3.3.3 support
Fix global figure enumeration
Fix binding without a title reported a single plot (
untitled 00
) instead of increasing the counter
Fix Python 2.7/3.5 support
Fix quote issue when reporting debug images
Fix replace quote safe characters in upload file to include
;=@$
Fix
at_exit
called from another process should be ignoredFix
Task.set_tags()
for completed / published tasksFix
Task.add_tags()
not working when running remotelyFix
Task.set_user_properties()
docstring and interfaceFix preview with JSON (dict) artifacts did not store the artifact
Fix
Logger.report_text()
on task created usingTask.create()
was not supportedFix initialization for torch: only call torch
get_worker_info
if torch was loadedFix flush (wait) on auxiliary task (obtained using
Task.get_task()
) should wait on all upload eventsFix server was not updated with the defaults from the code when running remotely and configuration section is missing
Fix connect dict containing
None
default values, blocked the remote execution from passing string instead of NoneFix
Task.upload_artifact()
argumentdelete_after_upload=True
used in conjunction withwait_for_upload=True
was not supported
Version 0.16.3¶
Trains¶
Features
Add LightGBM support
Add initial Hydra support (GitHub trains Issue 219)
Add synchronous support for
Task.upload_artifact()
(GitHub trains Issue 231)Add
sdk.development.store_code_diff_from_remote
(defaultfalse
) to store diff from remote HEAD instead of local HEAD (GitHub trains Issue 222)Add
sdk.development.detect_with_conda_freeze
(defaulttrue
) for full conda freeze (requires trains-agent >= 16.2)Add user properties support in Task object
Add
Logger.report_table()
support for table as list of listsAdd support to split DAG and Table in pipeline DAG plot. Pipeline DAG single nodes are now round circles below the DAG graph.
Add Pipeline/Optimization can be attached to any Task (not just the current task)
Add
force_download
flag toStorageManager.get_local_copy()
Add control over the artifact preview using
Task.upload_artifact()
preview
argumentAdd
Logger.report_matplotlib_figure()
with examplesAdd
Task.set_task_type()
AWS auto-scaler
Add key pair and security groups support
Add multi-line support for both extra bash script and extra
trains.conf
data
Update examples
Bug Fixes
Fix
Task.update_output_model()
wrong argument order (GitHub trains Issue 220)Fix initializing task on argparse parse in remote mode. Do not call
Task.init()
to avoid auto connect, useTask.get_task()
insteadFix detected task cwd outside of repository root folder
Fix
Task.connect(dict)
to place non-existing entries on the section name instead of GeneralFix
Task.clone()
support for trains-server < 0.16Fix
StorageManager
cache extract zipped artifacts. Use modified time instead of access time for cached filesFix diff command output was stripped
Make sure local packages with multi-files are marked as
package
Fix
Task.set_base_docker()
should be skipped when running remotelyFix ArgParser binding handling of string argument with boolean default value (affects Pytorch Lightning integration)
When using
detect_with_pip_freeze
make sure thatpackage @ file://
lines are replaced withpackage==x.y.z
as local file will probably not be availableFix git packages to new pip standard
package @ git+
Improve conda package naming
_
and-
supportDo not add specific setuptools version to requirements (pip can’t install it anyway)
Fix image URL quoting when uploading from a file path
Version 0.16.2¶
Trains¶
Features
Add
Task.set_resource_monitor_iteration_timeout()
to set ResourceMonitor iteration wait duration timeout (GitHub trains Issue 208).Add PyTorch Lightning save/restore model binding (GitHub trains Issue 212).
Add
git diff
for repository submodule (requires git 2.14 or above).Add
TrainsJob.is_completed()
andTrainsJob.is_aborted()
.Add
Task.logger
property.Add Pipeline Controller automation and example (see here).
Add improved trace filtering capabilities in
trains.debugging.trace.trace_trains()
.Add default help per argument (if not provided) in ArgParser binding.
Deprecate
Task.reporter
.Update PyTorch example.
Remove warning on skipped auto-magic model logging (GitHub trains Issue 206).
Support Keras restructuring for Network, Model and Sequential.
Update autokeras requirements according to https://github.com/keras-team/autokeras#installation.
Bug Fixes
Fix joblib auto logging models failing on compressed streams (GitHub trains Issue 203).
Fix sending empty reports (GitHub trains Issue 205).
Fix scatter2d sub-sampling and rounding.
Fix plots reporting
NaN
representation (matplotlib conversion).Limit the number of digits in a plot to reduce plot size (using
sdk.metrics.plot_max_num_digits
configuration value).
Fix
Task.wait_for_status()
to reload after it ends.Fix thread wait Ctrl-C interrupt did not exit process.
Improve Windows support for installed packages analysis.
Fix auto model logging using relative path.
Fix Hyper-parameter Optimization example.
Fix
Task.clone()
when working with TrainsServer < 0.16.0.Fix pandas artifact handling.
Avoid adding
unnamed:0
column.Return original pandas object.
Fix
TrainsJob
hyper-params overriding order was not guaranteed.Fix ArgParse auto-connect to support default function type.
Trains-Agent¶
Features
conda
Add
agent.package_manager.conda_env_as_base_docker
allowing “docker_cmd” to contain link to a full pre-packaged conda environment (tar.gz
created byconda-pack
). UseTRAINS_CONDA_ENV_PACKAGE
environment variable to specifyconda tar.gz
file.Add conda support for read-only pre-built environment (pass conda folder as
docker_cmd
on Task)Improve trying to find conda executable
k8s glue
Add support for limited number of services exposing ports
Add support for k8s pod custom user properties
Allow selecting external
trains.conf
file for the pod itselfAllow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress/ELB)
Allow specifying
cudatoolkit
version in the “installed packages” section when using conda as package manager (GitHub trains Issue 229)Add
agent.package_manager.force_repo_requirements_txt
. If True, “Installed Packages” on Task are ignored, and only repositoryrequirements.txt
is usedPass
TRAINS_DOCKER_IMAGE
into docker for interactive sessionsAdd
torchcsprng
andtorchtext
to PyTorch resolving
Bug Fixes
When logging suppress “\r” when reading a current chunk of a file/stream. Add
agent.suppress_carriage_return
(default True) to support previous behaviorMake sure
TRAINS_AGENT_K8S_HOST_MOUNT
is used only once per mountFix k8s glue script to trains-agent default docker script
Fix apply git diff from submodule only
conda
Fix conda pip freeze to be consistent with trains 0.16.3
Fix conda environment support for trains 0.16.3 full env. Add
agent.package_manager.conda_full_env_update
to allow conda to update back the requirements (default False, to preserve previous behavior)Fix running from conda environment -
conda.sh
not found in first conda PATH match
Fix docker mode ubuntu/debian support by making sure not to ask for input (fix
tzdata
install)Fix repository detection - ignore environment
SSH_AUTH_SOCK
, only check if git user/pass are configuredgit diff
Fix support for non-ascii diff
Fix diff with empty line at the end will cause corrupt diff apply message
Allow zero context diffs (useful when blind patching repository)
Fix
daemon --stop
when agent UID cannot be locatedFix nvidia docker support on some linux distros (SUSE)
Fix nvidia pytorch dockers support
Fix torch CUDA 11.1 support
Fix requirements dict with null entry in
pip
should be considered None install from repository’srequirements.txt
Version 0.16.1¶
Trains¶
Features
Enhance HyperParameter optimizer.
Bug Fixes
Fix typing dependency for Python<3.5 (GitHub trains Issue 184).
Fix git+https requirements handling, resolve top_level.txt package name (kerastuner from git was not detected).
Fix
Task.get_reported_console_output()
for new Trains Server API v2.9.Fix cache handling for different partitions/drives/devices.
Disable offline mode when running remotely (i.e. executed by Trains Agent).
Fix artifact upload to only use file stream when not uploading a locally stored file (multipart upload is not supported on stream upload) (GitHub trains Issue 189).
Fix double-escaped model design text when connecting OutputModel.
Trains Server¶
IMPORTANT: Upgrading to this version requires a manual data migration.
Bug Fixes
Fix model page issue causing N/A to show after switching tabs (Trains Slack channel thread).
Removed experiments comparison limit (only 10 were allowed). Limit is now 100, configurable using
services.tasks.multi_task_histogram_limit
. (Trains Slack channel thread).Fix scalar plots sometimes not calculated by the server in lower iteration values (Trains Slack channel thread).
Fix error while retrieving experiment log when only a few lines were reported (GitHub trains-server Issue 59).
Update Fixed User full-name on restart (Trains Slack channel thread).
Fix project ordering issue.
When loading plots, display a spinner and don’t show “no data”.
Improve logging to provide more coherent ElasticSearch connection status in server log.
Trains Agent¶
Features
Add
sdk.metrics.plot_max_num_digits
configuration option to reduce plot storage size.Add
agent.package_manager.post_packages
andagent.package_manager.post_optional_packages
configuration options to control packages install order (e.g. horovod).Add
agent.git_host
configuration option for limiting git credential usage for a specific host (overridable usingTRAINS_AGENT_GIT_HOST
environment variable).Add
agent.force_git_ssh_port
configuration option to control https to ssh link conversion for non standard ssh ports.Add requirements detection features
Improve support for detecting new pip version (20+) supporting
package @ scheme://link
.
Bug Fixes
Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a
git+http
link is enough to make sure all requirements are met/installed (GitHub Issue #196).Fix incorrect check for spaces in current execution folder.
Fix requirements detection
Update torch version after using downloaded / system pre-installed version.
Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version).
Version 0.16.0¶
Trains¶
Features
Add continuing of previously executed experiments. Add
Task.init()
argumentcontinue_last_task
to continue a previously used Task (GitHub Issue #160).Allow Task editing/creation from code.
Task.export_task/import_task/update_task()
(GitHub Issue #128).Add offline mode. Use
Task.set_offline()
andTask.import_offline_session()
Support setting offline mode via
TRAINS_OFFLINE_MODE=1
environment variable.Support setting offline API version via
TRAINS_OFFLINE_MODE=2.9
environment variable.
Automatically pickle all objects when uploading as artifacts,
task.upload_artifact()
argumentauto_pickle=True
(GitHub Issue #153).Add multiple sections/groups support for Task hyper-parameters using
Task.connect()
.Add multiple configurations (files) using
Task.connect_configuration
.Allow enabling OS environment logging using the
sdk.development.log_os_environments
configuration parameter (complements theTRAINS_LOG_ENVIRONMENT
environment variable).Add Optuna support for hyper-parameter optimization controller.
OptimizerOptuna
is now the default optimizer.Add initial Keras-Tuner support (GitHub Issue keras-team/keras-tuner #334).
Add automatic FastAI logging. It is disabled if tensorboard is loaded (assuming TensorBoradLogger will be used).
Support Tensorboard text logging (
add_text()
) as debug samples (.txt
files), instead of as console output.Allow for more standard confusion matrix reporting.
Logger.report_confusion_matrix()
argumentyaxis_reversed
(flips the confusion matrix ifTrue
, defaultFalse
) (GitHub Issue #165).Add support for Trains Server 0.16.0 (API v2.9 support).
Allow disabling Trains update message from the log using the
TRAINS_SUPPRESS_UPDATE_MESSAGE
environment variable (GitHub Issue #157).Add AWS EC2 Auto-Scaler service wizard and Service.
Improved and updated examples
Add Keras Tuner CIFAR10 example.
Add FastAI example.
Update PyTorch Jupyter notebook examples (GitHub Issue #150).
Support global requirements detection using
pip freeze
(setsdk.development.detect_with_pip_freeze
configuration intrains.conf
).Add
Task.get_projects()
to get all projects in the system, sorted by last update time.
Bug Fixes
Fix UTC to time stamp in comment (GitHub Issue #152).
Fix and enhance GPU monitoring
Fix GPU stats on Windows machines (GitHub Issue #177).
More robust GPU monitoring (GitHub Issue #170).
Fix filename too long bug (GitHub trains-server Issue #49).
Fix TensorFlow image logging to allow images with no width/height/color metadata (GitHub Issue #182).
Fix multiprocessing Pool throw exception in pool hangs execution. Call original signal handler and re-flush
stdout
.Fix
plotly
support formatplotlib
3.3.Add Python 2.7 support for
get_current_thread_id()
.Update examples requirements.
Fix and improve signal handling.
Fix Tensorboard 2D convolution histogram, improve histogram accuracy on very small histograms.
Fix auto logging multiple argparse calls before
Task.init()
.Limit experiment Git diff logging to 500Kb. If larger than 500Kb, diff section will contain a warning and entire diff will be uploaded as an artifact named
auxiliary_git_dif
.Fix requirements detection
Fix Trains installed from
git+
.Fix when Trains is not directly imported.
Fix multiple
-e
packages were not detected (only the first one).Fix running with Trains in
PYTHONPATH
resulted in double entry of trains.
Fix
Task.set_base_docker()
on main task to do nothing when running remotely.
Trains Server¶
IMPORTANT: Upgrading to this version requires a manual data migration.
Features
Add experiment hyperparameter grouping.
HYPER PARAMETERS tab renamed to CONFIGURATIONS.
CONFIGURATIONS tab contains the sections USER PROPERTIES, HYPER PARAMETERS, CONFIGURATION OBJECTS
Add user properties group. Key-value pairs always editable (USER PROPERTIES section).
Add command line options group * argparse and older experiments parameters (CONFIGURATIONS/HYPER PARAMETERS/Args).
Add TensorFlow definitions group (CONFIGURATIONS/HYPER PARAMETERS/TF_DEFINE).
Add environment variables group (CONFIGURATIONS/HYPER PARAMETERS/Environment).
Improve experiment model configuration.
Model design is in the ARTIFACTS tab.
Experiment model description is in the CONFIGURATIONS OBJECTS section in the CONFIGURATIONS tab.
Improve experiment comparison:
In hyperparameter parallel coordinate comparison, hover over an experiment name to highlight it on plot (GitHub Issue #53).
Remove fields providing no additional information from comparison.
Improve the model framework filter. Filter contains only frameworks used by models in the project.
Add configurable Trains services examples.
Add support for text debug samples in the DEBUG SAMPLES section in the RESULTS tab.
Add legend on/off toggle control for every plot.
Add clear button for text areas (GitHub trains-server Issue #42).
Reinstate the bottom bar Archive button.
Add Trains community links to left bar.
Add Hi-DPI display support.
Add
debug.ping
endpoint for simple health monitoring.Add support for field exclusion in
*.get_all
endpoints.Move to ElasticSearch 7. Requires manual data migration
Bug Fixes
Auto-fit column width on column resize double click.
Allow top-bar search if fewer than three characters are entered, and
Enter
is pressed.
Trains Agent¶
Features
Add
agent.docker_init_bash_script
configuration section to allow finer control over docker startup script.Changed default docker image from
nvidia/cuda
tonvidia/cuda:10.1-runtime-ubuntu18.04
to supportcudnn
frameworks (e.g. TF).Improve support for dockers with preinstalled
conda
environment.Improve trains-agent-docker spinning.
Add
daemon --order-fairness
for round-robin queue pulling.Add
daemon --stop
to terminate a running agent (assuming other arguments are the same)If no additional arguments, Agents are terminated in lexicographical order.
Support cleanup of all log files on termination unless executed with
--debug
.Add error message when Trains API Server is not accessible on startup.
Bug Fixes
Fix GPU Windows monitoring support (GitHub Issue #177).
Fix
.git-credentials
and.gitconfig
mapping into docker.Fix non-root docker image usage.
Fix docker to use
UTF-8
encoding, so prints won’t break it.Fix
--debug
to set all loggers toDEBUG
.Fix task status change to
queued
should never happen during Task runtime.Fix
requirement_parser
to supportpackage @ git+http
lines.Fix GIT user/password in requirements and support for
-e git+http
lines.Fix configuration wizard to generate
trains.conf
matching latest Trains definitions.