Project Structure
- Each project should have a
config.yml
file. - You can have multiple pipeline files in your project. If you refer to a pipeline file in your configuration file, you should have that file in your project directory.
- Each pipeline should have a unique alias and should be defined in the different files.
- You can have multiple utility files in your project.
Configuration File
Configuration File Schema
Basic Fields
project_name
: Name of your project (required)project_id
: Unique ID for your project (required)pipelines
: List of pipelines in your projectapps
: List of intelligent apps in your project (optional)endpoints
: List of endpoints in your project (optional)
Pipeline
alias
: Short name for your pipelinepath
: Location of your pipeline filecompute
: Computing instance type. Options are:XSMALL
: 2 vCPU, 8 GB RAMSMALL
: 4 vCPU, 16 GB RAMMEDIUM
: 8 vCPU, 32 GB RAMLARGE
: 16 vCPU, 64 GB RAMXLARGE
: 32 vCPU, 128 GB RAM (Enterprise only)
spark_config.deploy_mode
: How Spark runs. Default islocal
. Options are:client
: Spark runs in the same process as the drivercluster
: Spark runs in a separate process
spark_config.executor_instances
: Number of executors to use in PySpark. Ifspark_config.deploy_mode
islocal
, this field is ignored.python_dependencies
: List of Python dependencies for your pipeline
Apps
path
: Location of your app file
Endpoints
path
: Location of your endpoint configuration file
Python Dependencies
name
: Name of the Python packageversion
: Version of the Python package (optional)index_url
: URL of the Python package index (optional)