Overview

The Toolkit provides utilities to help you work with data in Datazone notebooks. Currently, it includes the Dataset class for accessing datasets.

Dataset

The Dataset class allows you to easily load datasets into your notebooks as Pandas or PySpark DataFrames.

from datazone import Dataset

dataset = Dataset(id="<dataset-id>")
# or
dataset = Dataset(alias="<dataset-alias>")

You can also load a dataset by providing a specific branch name:

from datazone import Dataset

dataset = Dataset(id="<dataset-id>", branch="<branch-name>")

Thanks to the Dataset class, you can now easily load datasets into your notebooks as Pandas or PySpark DataFrames.

from datazone import Dataset

df = dataset.to_pandas()
# or
pyspark_df = dataset.to_pyspark()
to_pyspark() is only available in the PySpark kernel.

Variable

You can access the Variables from the kernel environment using the Variable class.

from datazone import Variable

variable = Variable(name="<variable-name>")

Example

from datazone import Variable

api_key = Variable(name="OPENAI_API_KEY")
client = OpenAI(api_key=str(api_key))
...

Also you can check the Variables