Storage Examples

This page describes storage examples using the StorageManager and StorageHelper classes. StorageHelper provides some of the same methods as StorageManager, as well as additional methods not in StorageManager.

The storage examples include:

StorageManager

Downloading a file (using StorageManager)

To download an ZIP file from storage to the global cache context, call the StorageManager.get_local_copy method, and specifying the destination location as the remote_url argument.

# create a StorageManager instance
manager = StorageManager()

manager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.zip")

To download a file to specific context in cache, and specify the cache_context argument as the name of the context.

manager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext", cache_context="test")

To download a non-compressed file, set the extract_archive argument as False.

manager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext", extract_archive=False

Uploading a file (using StorageManager)

To upload a file to storage, call the StorageManager.upload_file method. Specify the full path of the local file as the local_file parameter argument, and the remote URL as the remote_url parameter argument.

manager.upload_file(local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder")

Setting cache limits

To set a limit on the number of files cached, call the StorageManager.set_cache_file_limit method and specify the cache_file_limit argument as the maximum number of files. This does not limit the cache size, only the number of files.

new_cache_limit = manager.set_cache_file_limit(cache_file_limit=100)

StorageHelper

Listing stored objects

To list objects in storage, call the StorageHelper.get method, and specify the storage URL as the url parameter argument.

from trains.storage.helper import StorageHelper
helper = StorageHelper.get(url="s3://MyBucket/MyFolder/")

Downloading a file (using StorageHelper)

To download a file from storage, call the StorageHelper.get_local_copy method, and specify the full path of the object as the remote_url argument.

print(helper.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext"))

Downloading files (using StorageHelper)

To download files from storage to local storage, and get the full path of downloaded files in local storage, call the StorageHelper.get_local_copy method (which calls StorageHelper.download_to_file).

The following example demonstrates downloading all files from a folder in an AWS S3 bucket: MyBucket/MyFolder.

We create a method named download_bucket (for this example, only). In that method, we first create a StorageHelper instance, boto client, an array to contain the full paths of the downloaded files, and parse the bucket name.

from trains.storage.helper import StorageHelper

def download_bucket(bucket_name):
    """
    Download specific bucket data to local path and return the local paths
    :param bucket_name: full bucket name w/o s3:// prefix
    :return: List contains the local paths
    """
    # create StorageHelper instance, include bucket and optional folder
    helper = StorageHelper.get(f"s3://{bucket_name}") 
    # create a StorageHelper instance of a S3 boto client
    bucket_client = helper._container.resource.meta.client
    # an array for the objects 
    frames_to_upload = []
    # handle naming
    bucket_root_dir, _, bucket_prefix = bucket_name.partition("/")

Next, get all the objects, and the object keys which contain the object full paths, including the file extensions.

    # get all objects
    objects = bucket_client.list_objects_v2(Bucket=bucket_root_dir, Prefix=bucket_prefix)

    # get the contents of objects
    bucket_objects = objects.get("Contents", [])  

    # get object keys
    sources = {elem.get("Key"): elem for elem in bucket_objects if elem.get("Key")} 

Loop through the objects, and for each object, excluding full folder downloads, call StorageHelper.get_local_copy to get a local copy. The method download_bucket returns a list of the objects.


    # for all object keys
    for source in sources:
        try:
            # building the full path to the object to download
            source_path = f"s3://{bucket_root_dir}/{source}"
            # exclude folders 
            if not source_path.endswith("/"): 
                # get local copy; get_local_copy uses download_to_file
                frames_to_upload.append(helper.get_local_copy(source_path)) 
        except Exception as ex:
            print(ex)

    # return the list of object local paths            
    return frames_to_upload

Finally, print the objects.

print(download_bucket("MyBucket/MyFolder")) # print the objects as local path download

Verifying storage upload access

To verify upload access to storage, call the StorageHelper.verify_upload method. Specify the remote location URL as the folder_uri argument.

print(helper.verify_upload(folder_uri="s3://MyBucket/MyFolder"))

Uploading a file (using StorageHelper)

To verify upload access to storage, call the StorageHelper.upload method. Specify the full path of the file as the src_path argument, and the location destination as the dest_path argument.

helper.upload(src_path="/mnt/data/file.ext", dest_path="s3://MyBucket/MyFolder")