Overview
The Toolkit provides utilities to help you work with data in Datazone notebooks. Currently, it includes the Dataset class for accessing datasets.
Dataset
The Dataset class allows you to easily load datasets into your notebooks as Pandas or PySpark DataFrames.
from datazone import Dataset
dataset = Dataset(id="<dataset-id>")
# or
dataset = Dataset(alias="<dataset-alias>")
You can also load a dataset by providing a specific branch name:
from datazone import Dataset
dataset = Dataset(id="<dataset-id>", branch="<branch-name>")
Thanks to the Dataset class, you can now easily load datasets into your notebooks as Pandas or PySpark DataFrames.
from datazone import Dataset
pandas_df = dataset.get_pandas_df()
# or
pyspark_df = dataset.get_pyspark_df()
get_pyspark_df()
is only available in the PySpark kernel.
Variable
You can access the Variables from the kernel environment using the Variable
class.
from datazone import Variable
variable = Variable(key="<variable-name>")
Example
from datazone import Variable
api_key = Variable(key="OPENAI_API_KEY")
client = OpenAI(api_key=str(api_key))
...
Also you can check the Variables