file-container-cover

Overview

File Containers allows you to manage and store files. Each project has its own isolated file container. You can interact with file containers like AWS S3 buckets or Google Cloud Storage buckets. Also, Datazone provides toolkits to interact with file containers. You can use the FileContainerClient in notebooks and pipelines to interact with file containers.
  1. Click Projects in the left sidebar.
  2. Choose a project from the card list.
  3. Click File Containers tab in the left sidebar.
file-container

Client Usage

The FileContainerClient provides a convenient interface to interact with file containers using S3-compatible storage. It handles authentication and bucket management automatically. You can access the FileContainerClient in your pipelines and notebooks like this:
from datazone import FileContainerClient

list_objects

Lists objects in the file container with an optional prefix filter.
from datazone import FileContainerClient

# List all objects
objects = FileContainerClient.list_objects()

# List objects with a specific prefix
documents = FileContainerClient.list_objects("documents/")
Parameters:
  • prefix (str): Optional prefix to filter objects by path
Returns:
  • list: List of object metadata dictionaries. Example:
[
    {
        'Key': 'customer_list.csv',
        'LastModified': datetime.datetime(2025, 7, 24, 19, 44, 12, 475000, tzinfo=tzlocal()),
        'ETag': '"bf36dc829c4229254b7df3c428d0a349"',
        'Size': 18311622,
        'StorageClass': 'STANDARD'
    }
]

get_object

Retrieves an object from the file container by its key.
# Get a file's content
file_data = FileContainerClient.get_object("data/sample.csv")

# Convert bytes to string for text files
content = file_data.decode('utf-8')
Parameters:
  • key (str): The key/path of the object to retrieve
Returns:
  • bytes: The object’s raw data
Objects are stored as bytes, so you may need to encode/decode text data appropriately

put_object

Stores data in the file container at the specified key.
# Store text data
text_data = "Hello, World!".encode('utf-8')
FileContainerClient.put_object("messages/hello.txt", text_data)

# Store binary data
with open("local_file.pdf", "rb") as f:
    file_data = f.read()
    FileContainerClient.put_object("documents/file.pdf", file_data)
Parameters:
  • key (str): The key/path where the object will be stored
  • data (bytes): The data to store

delete_object

Removes an object from the file container.
# Delete a specific file
FileContainerClient.delete_object("temp/old_file.txt")
Parameters:
  • key (str): The key/path of the object to delete

Examples

Periodically Uploading Files in a Pipeline

You can use the FileContainerClient to periodically upload files to your file container. This can be useful for tasks like logging, data collection, or backups.
from datazone import transform, FileContainerClient
import requests
from datetime import datetime
import io

@transform
def fetch_and_store_llm_data():
    # URL to fetch the data from
    url = "https://docs.datazone.co/llms-full.txt"
    
    # Get the current timestamp in ISO format
    timestamp_as_iso = datetime.now().isoformat()
    
    # Fetch the data from the URL
    response = requests.get(url)
    
    # Check if the request was successful
    if response.status_code == 200:
        # Get the content
        content = response.text
        
        # Store the content in the file system
        FileContainerClient.put_object(f"daily_llm/{timestamp_as_iso}/llm.txt", io.BytesIO(content.encode('utf-8')))
        
        return f"Successfully stored LLM data with timestamp {timestamp_as_iso}"
    else:
        raise Exception(f"Failed to fetch data: HTTP {response.status_code}")

Read a Parquet File in a Notebook

import io
import pandas as pd
from datazone import FileContainerClient

# Read a Parquet file from the file container
data = FileContainerClient.get_object("datasets/sample.parquet")

# Convert bytes to a Pandas DataFrame
df = pd.read_parquet(io.BytesIO(data))