Project Repository

Project Structure

my-project-directory
├── pipeline.py
├── utils.py
├── config.yml

Each project should have a config.yml file.
You can have multiple pipeline files in your project. If you refer to a pipeline file in your configuration file, you should have that file in your project directory.
Each pipeline should have a unique alias and should be defined in the different files.
You can have multiple utility files in your project.

When you change your project files, recommended way to deploy it using Datazone CLI. Thus we can check your changes and if there is any error, we can notify you before deploying it. You can deploy your changes via following command.

datazone project deploy

After you deploy your changes, you can check your project status by running the following command:

datazone project summary

Configuration File

project_name: my-pretty-project
project_id: 67280ba2f4a0960d02159675
pipelines:
- alias: hello_world
  path: hello-world.py
  compute: LARGE
  spark_config:
    deploy_mode: client
    executor_instances: 3
  python_dependencies:
    - name: pandas
      version: 1.3.3

Configuration File Schema

Basic Fields

project_name: Name of your project (required)
project_id : Unique ID for your project (required)
pipelines: List of pipelines in your project
apps: List of intelligent apps in your project (optional)
endpoints: List of endpoints in your project (optional)

Pipeline

alias: Short name for your pipeline
path: Location of your pipeline file
compute : Computing instance type. Options are:
- XSMALL: 2 vCPU, 8 GB RAM
- SMALL: 4 vCPU, 16 GB RAM
- MEDIUM: 8 vCPU, 32 GB RAM
- LARGE: 16 vCPU, 64 GB RAM
- XLARGE: 32 vCPU, 128 GB RAM (Enterprise only)
spark_config.deploy_mode: How Spark runs. Default is local. Options are:
- local: Spark runs locally on a single machine
- client: Spark runs in the same process as the driver
- cluster: Spark runs in a separate process (Enterprise only)
spark_config.executor_instances: Number of executors to use in PySpark. If spark_config.deploy_mode is local, this field is ignored.
python_dependencies: List of Python dependencies for your pipeline

Apps

path: Location of your app file

Endpoints

path: Location of your endpoint configuration file

Python Dependencies

name: Name of the Python package
version: Version of the Python package (optional)
index_url: URL of the Python package index (optional)

Get Started

Reference

Deployment

Tutorial

Project Repository

Project Structure

Configuration File

Configuration File Schema

Basic Fields

Pipeline

Apps

Endpoints

Python Dependencies

Get Started

Reference

Deployment

Tutorial

​Project Structure

​Configuration File

​Configuration File Schema

​Basic Fields

​Pipeline

​Apps

​Endpoints

​Python Dependencies

Project Structure

Configuration File

Configuration File Schema

Basic Fields

Pipeline

Apps

Endpoints

Python Dependencies