A Project is the root entity in Datazone used to keep your datasets, extracts, pipelines, schedules, and executions well organized. When you start creating your data lakehouse in Datazone, the first step would be creating a Project. You can also check detailed information about the project concept of Datazone here .

Let’s start your first project

  1. Go to the Projects page by clicking on the Projects tab in the sidebar.
  2. Click on the Create Project button.
  1. Fill in the required fields and click on the Create button.

Create Project via Datazone CLI

datazone project create my-pretty-project

After you have successfully created your project, you can change directory to your project.

cd my-pretty-project

Clone already created project to your local

If you create your project previously or on UI already, you can clone it to your local via following command.

datazone project clone <project-id>

Example Project Folder Structure

You can manage your projects however you want. You can separate your pipelines into different directories or keep them in the root directory.

my-pretty-project
├── pipeline.py
├── utils.py
├── config.yml
config.yml is a required file for the all projects. You should not delete it.

Example Configuration File (config.yaml)

project_name: my-pretty-project
project_id: 67280ba2f4a0960d02159675
pipelines:
- alias: hello_world
  path: hello-world.py
  compute: LARGE
  spark_config:
    deploy_mode: client
    executor_instances: 3