Overview
A Datazone project is a Git repository containing your data pipelines, actions, intelligent apps, and endpoints. All project components are defined in a centralconfig.yml file.
Project Structure
my-project
config.yml
pipelines
hello_world.py
data_processing.py
actions
apps
utils.py
- Each project must have a
config.ymlfile - Each pipeline should have a unique alias and be defined in separate files
- You can organize your code with utility files for shared logic
Configuration File
Theconfig.yml file defines all project resources:
config.yml
Configuration Reference
Project Fields
Name of your project. Used for display and identification.
Unique identifier for your project. Generated when you create a project.
List of data pipeline definitions. Each pipeline processes and transforms data.
List of serverless action functions. Actions can be triggered by endpoints or used by AI agents.Learn more in the Actions documentation.
List of intelligent AI applications.
Pipeline Configuration
Short, unique identifier for the pipeline. Used in CLI commands and UI.
Relative path to the pipeline Python file from project root.
Compute instance size for pipeline execution.Available sizes:
XSMALL- 2 vCPU, 8 GB RAMSMALL- 4 vCPU, 16 GB RAMMEDIUM- 8 vCPU, 32 GB RAMLARGE- 16 vCPU, 64 GB RAMXLARGE- 32 vCPU, 128 GB RAM (Enterprise only)
Spark deployment mode for distributed processing.Options:
local- Runs on a single machine (default)client- Driver runs in the same process, executors run separatelycluster- Both driver and executors run in separate processes (Enterprise only)
Number of Spark executors for parallel processing. Only applies when
deploy_mode is client or cluster.Additional Spark configuration properties. Pass any custom Spark configuration key-value pairs.
Python packages required by the pipeline. Installed before execution.
Python Dependency Fields
Python package name from PyPI or custom index.
Specific package version. If omitted, installs the latest version.
Custom Python package index URL. Useful for private packages or mirrors.
Action Configuration
Relative path to the action Python file containing an Each file should contain one action function. Learn more in the Actions documentation.
@action decorated function.App Configuration
Relative path to the intelligent app Python file.