Project Structure
- Each project should have a
config.ymlfile. - You can have multiple pipeline files in your project. If you refer to a pipeline file in your configuration file, you should have that file in your project directory.
- Each pipeline should have a unique alias and should be defined in the different files.
- You can have multiple utility files in your project.
Configuration File
Configuration File Schema
Basic Fields
project_name: Name of your project (required)project_id: Unique ID for your project (required)pipelines: List of pipelines in your projectapps: List of intelligent apps in your project (optional)endpoints: List of endpoints in your project (optional)
Pipeline
alias: Short name for your pipelinepath: Location of your pipeline filecompute: Computing instance type. Options are:XSMALL: 2 vCPU, 8 GB RAMSMALL: 4 vCPU, 16 GB RAMMEDIUM: 8 vCPU, 32 GB RAMLARGE: 16 vCPU, 64 GB RAMXLARGE: 32 vCPU, 128 GB RAM (Enterprise only)
spark_config.deploy_mode: How Spark runs. Default islocal. Options are:client: Spark runs in the same process as the drivercluster: Spark runs in a separate process
spark_config.executor_instances: Number of executors to use in PySpark. Ifspark_config.deploy_modeislocal, this field is ignored.python_dependencies: List of Python dependencies for your pipeline
Apps
path: Location of your app file
Endpoints
path: Location of your endpoint configuration file
Python Dependencies
name: Name of the Python packageversion: Version of the Python package (optional)index_url: URL of the Python package index (optional)