Key Concepts
Datazone is a modern data platform that simplifies your data engineering journey by providing a unified environment for data ingestion, processing, and analysis. It seamlessly connects your data sources to a robust data lakehouse while offering powerful tools for transformation, orchestration, and exploration.
Core Entities
Source
Think of Sources as secure gateways to your data. They act as bridges between your external data systems (databases, cloud storage, streaming platforms) and Datazone. Sources handle the crucial task of credential management and access control, typically managed by organization administrators. They ensure your data connections are both secure and efficient.
Project
Projects are your data initiatives home base. They provide a structured workspace where you can organize related data work - from ingestion to analysis. Each project is a self-contained environment housing Extracts, Pipelines, Notebooks, Datasets, and Schedules. With flexible permission settings, you can control who accesses what, making it perfect for both team collaboration and data governance.
Extract
Extracts are your data ingestion powerhouses. Connected to Sources and living within Projects, they define how data should be pulled from external systems. When executed, Extracts create standardized Datasets in your data Lakehouse. Think of them as smart data movers that handle the heavy lifting of data ingestion while ensuring data quality and consistency.
Pipeline
Pipelines are where data engineering magic happens. Built as Directed Acyclic Graphs (DAGs), they represent your data transformation workflows. Pipelines are defined in code, making them version-controlled, reusable, and maintainable. They can clean, filter, join, and reshape your data, turning raw information into valuable insights.
Schedule
Schedules bring automation to your data workflows. Using cron expressions, they orchestrate when your Extracts and Pipelines should run. They’re the timekeepers of your data platform, ensuring your data processes run like clockwork, whether it’s hourly updates or monthly aggregations.
Notebook
Notebooks are your interactive playground for data exploration and analysis. Similar to Jupyter notebooks but integrated into Datazone, they provide a user-friendly interface where you can write code, visualize data, and debug your transformations. They’re perfect for both quick data investigations and detailed analysis.
How It All Fits Together
Your data journey in Datazone typically flows like this:
- Connect to external systems through Sources
- Organize your work in Projects
- Ingest data using Extracts
- Transform data with Pipelines
- Automate workflows using Schedules
- Analyze results in Notebooks
This integrated approach ensures a smooth data lifecycle, from ingestion to insights, while maintaining security, scalability, and ease of use.
Each entity in Datazone is designed to solve specific data engineering challenges while working harmoniously with others. This modular yet integrated approach makes Datazone powerful enough for complex data operations yet simple enough for quick data tasks.