Getting Started

Create your data pipelines and serve with ease using Datazone in minutes. Datazone is a data platform that allows you to build, manage, and monitor your data pipelines with a few clicks.

Introduction

Datazone

Datazone is a modern data platform that simplifies your data engineering journey by providing a unified environment for data ingestion, processing, and analysis. It seamlessly connects your data sources to a robust data lakehouse while offering powerful tools for transformation, orchestration, and exploration.

Key Concepts

Before starting to use Datazone, you need to install the Datazone Python CLI. The client is a Python package that provides a simple interface to interact with the Datazone API.

Installation

In this guide, you will learn how to build a Data Lakehouse from scratch using Datazone.

From Zero to Production

From Zero to Production!

Release notes and feature updates for Datazone Intelligent Apps documentation.

Change Logs

Learn how to use Datazone SDK to access and manage your data in Datazone in your local environment.

Datazone SDK

Data ingestion is the first step in Datazone's data journey.

Data Ingestion

MySQL is an open-source relational database management system (RDBMS).

MySQL

Microsoft SQL Server is a relational database management system developed by Microsoft.

Microsoft SQL Server

Oracle is a powerful enterprise-grade relational database management system.

Oracle

SAP HANA is an in-memory, column-oriented, relational database management system.

SAP HANA

PostgreSQL is a powerful, open source object-relational database system.

PostgreSQL

MongoDB is a popular NoSQL database that stores data in flexible, JSON-like documents.

MongoDB

AWS S3 is a scalable object storage service that can be used to store and retrieve files.

AWS S3 CSV

Azure Blob Storage is Microsoft's object storage solution for the cloud, designed to store massive amounts of unstructured data.

Azure Blob Storage

One of the interaction way with Datazone is using Command Line Interface. You can manage your projects, datasets, and models with Datazone CLI commands.

Command Line

You can manage your projects with your way.

Project Repository

Pipelines are the main building blocks of your project. You can define your data processing steps in a pipeline file and deploy it to Datazone.

Pipeline

Transform functions are the bricks of your pipeline. You can atomize your data processing steps into small functions and chain them together to build a pipeline.

Transform

Each pipeline in Datazone has a context object that provides access to resources and configuration settings.

Context

Variables are used to store and manage data in the Datazone platform. You can define variables in the Datazone dashboard and use them in your pipelines.

Variables

API keys are used to authenticate requests to the Datazone API. You can create and manage your API keys from the Datazone dashboard.

API Keys

Overview of Datazone Integration and API capabilities

Overview

Learn how to authenticate with Datazone APIs

Authentication

Connect to Datazone using Clickhouse ODBC or JDBC drivers

ODBC/JDBC Connections

Complete reference for the Datazone REST API

REST API Reference

Notebooks are the interactive documents that you can write and run your code in Datazone.

Notebooks

Kernels are the execution environments for your Datazone notebooks.

Kernels

The Toolkit is a collection of tools and utilities that help you manage your notebooks in Datazone.

Toolkit

Build interactive data applications with Datazone's Intelligent Apps feature

Detailed reference for all chart types in Datazone Intelligent Apps

Charts

How to use filters for interactivity in Datazone Intelligent Apps.

Filters

Examples of using variables and Jinja-style templating in Intelligent App queries.

Variable Usage

Overview of Orion AI, the LLM-powered assistant for Intelligent Apps.

Orion AI (LLM Assistant)

Comprehensive reference for all YAML attributes in Intelligent App definitions.

YAML Reference

Datazone public cloud deployment for community and pro users

Public Cloud

Enterprise deployment of Datazone in your cloud environment

Private Cloud

Pyspark Examples

Pyspark Examples in Datazone Transforms

Learn how to create and manage data pipelines in Datazone through practical examples

Get Started

Reference

Deployment

Tutorial

Introduction

Getting Started

Create your Datazone Account

Install Datazone CLI Package

From Zero to Production

Get Started

Reference

Deployment

Tutorial

​Getting Started

Create your Datazone Account

Install Datazone CLI Package

From Zero to Production

Getting Started