Overview
File Containers allows you to manage and store files. Each project has its own isolated file container. You can interact with file containers like AWS S3 buckets or Google Cloud Storage buckets.
Also, Datazone provides toolkits to interact with file containers. You can use the FileContainerClient
in notebooks and pipelines to interact with file containers.
- Click Projects in the left sidebar.
- Choose a project from the card list.
- Click File Containers tab in the left sidebar.
Client Usage
The FileContainerClient
provides a convenient interface to interact with file containers using S3-compatible storage. It handles authentication and bucket management automatically.
You can access the FileContainerClient
in your pipelines and notebooks like this:
from datazone import FileContainerClient
list_objects
Lists objects in the file container with an optional prefix filter.
from datazone import FileContainerClient
# List all objects
objects = FileContainerClient.list_objects()
# List objects with a specific prefix
documents = FileContainerClient.list_objects("documents/")
Parameters:
prefix
(str): Optional prefix to filter objects by path
Returns:
list
: List of object metadata dictionaries. Example:
[
{
'Key': 'customer_list.csv',
'LastModified': datetime.datetime(2025, 7, 24, 19, 44, 12, 475000, tzinfo=tzlocal()),
'ETag': '"bf36dc829c4229254b7df3c428d0a349"',
'Size': 18311622,
'StorageClass': 'STANDARD'
}
]
get_object
Retrieves an object from the file container by its key.
# Get a file's content
file_data = FileContainerClient.get_object("data/sample.csv")
# Convert bytes to string for text files
content = file_data.decode('utf-8')
Parameters:
key
(str): The key/path of the object to retrieve
Returns:
bytes
: The object’s raw data
Objects are stored as bytes, so you may need to encode/decode text data appropriately
put_object
Stores data in the file container at the specified key.
# Store text data
text_data = "Hello, World!".encode('utf-8')
FileContainerClient.put_object("messages/hello.txt", text_data)
# Store binary data
with open("local_file.pdf", "rb") as f:
file_data = f.read()
FileContainerClient.put_object("documents/file.pdf", file_data)
Parameters:
key
(str): The key/path where the object will be stored
data
(bytes): The data to store
delete_object
Removes an object from the file container.
# Delete a specific file
FileContainerClient.delete_object("temp/old_file.txt")
Parameters:
key
(str): The key/path of the object to delete
Examples
Periodically Uploading Files in a Pipeline
You can use the FileContainerClient to periodically upload files to your file container. This can be useful for tasks like logging, data collection, or backups.
from datazone import transform, FileContainerClient
import requests
from datetime import datetime
import io
@transform
def fetch_and_store_llm_data():
# URL to fetch the data from
url = "https://docs.datazone.co/llms-full.txt"
# Get the current timestamp in ISO format
timestamp_as_iso = datetime.now().isoformat()
# Fetch the data from the URL
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
# Get the content
content = response.text
# Store the content in the file system
FileContainerClient.put_object(f"daily_llm/{timestamp_as_iso}/llm.txt", io.BytesIO(content.encode('utf-8')))
return f"Successfully stored LLM data with timestamp {timestamp_as_iso}"
else:
raise Exception(f"Failed to fetch data: HTTP {response.status_code}")
Read a Parquet File in a Notebook
import io
import pandas as pd
from datazone import FileContainerClient
# Read a Parquet file from the file container
data = FileContainerClient.get_object("datasets/sample.parquet")
# Convert bytes to a Pandas DataFrame
df = pd.read_parquet(io.BytesIO(data))