Storage Examples

The StorageManager and StorageHelper classes contain methods that you can use to list, download, and upload files to and from storage, as well as manage the cache. The StorageHelper class contains some methods which are not in the StorageManager class.

StorageManager

Import StorageManager

from trains.storage.manager import StorageManager

Download a file from storage

To download an ZIP file from storage to the global cache context, call the StorageManager.get_local_copy method, and specifying the destination location as the remote_url argument.

# create a StorageManager instance
manager = StorageManager()

manager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.zip")

To download a file to specific context in cache, and specify the cache_context argument as the name of the context.

manager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext", cache_context="test")

To download a non-compressed file, set the extract_archive argument as False.

manager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext", extract_archive=False

Upload a file to storage

To upload a file to storage, call the StorageManager.upload_file method. Specify the full path of the local file as the local_file parameter argument, and the remote URL as the remote_url parameter argument.

manager.upload_file(local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder")

Cache limits

To set a limit on the number of files cached, call the StorageManager.set_cache_file_limit method and specify the cache_file_limit argument as the maximum number of files. This does not limit the cache size, only the number of files.

new_cache_limit = manager.set_cache_file_limit(cache_file_limit=100)

StorageHelper

Listing objects in storage

To list objects in storage, call the StorageHelper.get method, and specify the storage URL as the url parameter argument.

from trains.storage.helper import StorageHelper
helper = StorageHelper.get(url="s3://MyBucket/MyFolder/")

Downloading a file

To download a file from storage, call the StorageHelper.get_local_copy method, and specify the full path of the object as the remote_url argument.

print(helper.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext"))

Downloading files

To download files from storage to local storage, and get the full path of downloaded files in local storage, call the StorageHelper.get_local_copy method (which calls StorageHelper.download_to_file).

The following example demonstrates downloading all files from a folder in an S3 bucket: MyBucket/MyFolder.

We create a method named download_bucket (for this example, only). In that method, we first create a StorageHelper instance, boto client, an array to contain the full paths of the downloaded files, and parse the bucket name.

from trains.storage.helper import StorageHelper

def download_bucket(bucket_name):
    """
    Download specific bucket data to local path and return the local paths
    :param bucket_name: full bucket name w/o s3:// prefix
    :return: List contains the local paths
    """
    # create StorageHelper instance, include bucket and optional folder
    helper = StorageHelper.get(f"s3://{bucket_name}") 
    # create a StorageHelper instance of a S3 boto client
    bucket_client = helper._container.resource.meta.client
    # an array for the objects 
    frames_to_upload = []
    # handle naming
    bucket_root_dir, _, bucket_prefix = bucket_name.partition("/")

Next, get all the objects, and the object keys which contain the object full paths, including the file extensions.

    # get all objects
    objects = bucket_client.list_objects_v2(Bucket=bucket_root_dir, Prefix=bucket_prefix)

    # get the contents of objects
    bucket_objects = objects.get("Contents", [])  

    # get object keys
    sources = {elem.get("Key"): elem for elem in bucket_objects if elem.get("Key")} 

Loop through the objects, and for each object, excluding full folder downloads, call StorageHelper.get_local_copy to get a local copy. The method download_bucket returns a list of the objects.


    # for all object keys
    for source in sources:
        try:
            # building the full path to the object to download
            source_path = f"s3://{bucket_root_dir}/{source}"
            # exclude folders 
            if not source_path.endswith("/"): 
                # get local copy; get_local_copy uses download_to_file
                frames_to_upload.append(helper.get_local_copy(source_path)) 
        except Exception as ex:
            print(ex)

    # return the list of object local paths            
    return frames_to_upload

Finally, print the objects.


print(download_bucket("MyBucket/MyFolder")) # print the objects as local path download

Verifying upload access

To verify upload access to storage, call the StorageHelper.verify_upload method. Specify the remote location URL as the folder_uri argument.

print(helper.verify_upload(folder_uri="s3://MyBucket/MyFolder"))

Uploading a file

To verify upload access to storage, call the StorageHelper.upload method. Specify the full path of the file as the src_path argument, and the location destination as the dest_path argument.

helper.upload(src_path="/mnt/data/file.ext", dest_path="s3://MyBucket/MyFolder")