Storage Module

class trains.storage.manager.StorageManager

StorageManager is helper interface for downloading & uploading files to supported remote storage Support remote servers: http(s)/S3/GS/Azure/File-System-Folder Cache is enabled by default for all downloaded remote urls/files

classmethod get_local_copy(remote_url, cache_context=None, extract_archive=True, name=None)

Get a local copy of the remote file. If the remote URL is a direct file access, the returned link is the same, otherwise a link to a local copy of the url file is returned. Caching is enabled by default, cache limited by number of stored files per cache context. Oldest accessed files are deleted when cache is full.

Parameters
  • remote_url (str) – remote url link (string)

  • cache_context (str) – Optional caching context identifier (string), default context ‘global’

  • extract_archive (bool) – if True returned path will be a cached folder containing the archive’s content, currently only zip files are supported.

  • name – name of artifact.

Returns

Full path to local copy of the requested url. Return None on Error.

classmethod upload_file(local_file, remote_url, wait_for_upload=True)

Upload a local file to a remote location. remote url is the finale destination of the uploaded file.

Examples:

upload_file('/tmp/artifact.yaml', 'http://localhost:8081/manual_artifacts/my_artifact.yaml')
upload_file('/tmp/artifact.yaml', 's3://a_bucket/artifacts/my_artifact.yaml')
upload_file('/tmp/artifact.yaml', '/mnt/share/folder/artifacts/my_artifact.yaml')
Parameters
  • local_file (str) – Full path of a local file to be uploaded

  • remote_url (str) – Full path or remote url to upload to (including file name)

  • wait_for_upload (bool) – If False, return immediately and upload in the background. Default True.

Returns

Newly uploaded remote URL.

classmethod set_cache_file_limit(cache_file_limit, cache_context=None)

Set the cache context file limit. File limit is the maximum number of files the specific cache context holds. Notice, there is no limit on the size of these files, only the total number of cached files.

Parameters
  • cache_file_limit (int) – New maximum number of cached files

  • cache_context (str) – Optional cache context identifier, default global context

Returns

The new cache context file limit.

class trains.storage.helper.StorageHelper

Storage helper. Used by the entire system to download/upload files. Supports both local and remote files (currently local files, network-mapped files, HTTP/S and Amazon S3)

classmethod get(url, logger=None, **kwargs)

Get a storage helper instance for the given URL

Returns

A StorageHelper instance.

classmethod get_local_copy(remote_url)

Download a file from remote URL to a local storage, and return path to local copy,

Parameters

remote_url – Remote URL. Example: https://example.com/file.jpg s3://bucket/folder/file.mp4 etc.

Returns

Path to local copy of the downloaded file. None if error occurred.

classmethod add_path_substitution(registered_prefix, local_prefix, replace_windows_sep=False, replace_linux_sep=False)

Add a path substitution rule for storage paths.

Useful for case where the data was registered under some path, and that path was later renamed. This may happen with local storage paths where each machine is has different mounts or network drives configurations

Parameters
  • registered_prefix – The prefix to search for and replace. This is the prefix of the path the data is registered under. This should be the exact url prefix, case sensitive, as the data is registered.

  • local_prefix – The prefix to replace ‘registered_prefix’ with. This is the prefix of the path the data is actually saved under. This should be the exact url prefix, case sensitive, as the data is saved under.

  • replace_windows_sep – If set to True, and the prefix matches, the rest of the url has all of the windows path separators (backslash ‘’) replaced with the native os path separator.

  • replace_linux_sep – If set to True, and the prefix matches, the rest of the url has all of the linux/unix path separators (slash ‘/’) replaced with the native os path separator.

classmethod clear_path_substitutions()

Removes all path substitution rules, including ones from the configuration file.

verify_upload(folder_uri='', raise_on_error=True, log_on_error=True)

Verify that this helper can upload files to a folder.

An upload is possible iff:
  1. the destination folder is under the base uri of the url used to create the helper

  2. the helper has credentials to write to the destination folder

Parameters
  • folder_uri – The destination folder to test. Must be an absolute url that begins with the base uri of the url used to create the helper.

  • raise_on_error – Raise an exception if an upload is not possible

  • log_on_error – Log an error if an upload is not possible

Returns

True, if, and only if, an upload to folder_uri is possible.

list(prefix=None)

List entries in the helper base path.

Return a list of names inside this helper base path. The base path is determined at creation time and is specific for each storage medium. For Google Storage and S3 it is the bucket of the path. For local files it is the root directory.

This operation is not supported for http and https protocols.

Parameters

prefix – If None, return the list as described above. If not, it must be a string - the path of a sub directory under the base path. the returned list will include only objects under that subdir.

Returns

The paths of all the objects in the storage base path under prefix. Listed relative to the base path.

classmethod download_from_url(remote_url, local_path, overwrite_existing=False)

Download a file from remote URL to a local storage

Parameters
  • remote_url – Remote URL. Example: https://example.com/image.jpg or s3://bucket/folder/file.mp4 etc.

  • local_path – target location for downloaded file. Example: /tmp/image.jpg

  • overwrite_existing – If True and local_path exists, it will overwrite it, otherwise print warning

Returns

local_path if download was successful.