Context

from datazone import ContextType

@transform
def my_transform(context: ContextType):
    context.state.write(key="my_key", value="my_value")

State

The context object provides a state attribute that allows you to read and write key-value pairs. The state is stored in database and can be accessed in any transform function in the pipeline. Also you can access the state in the next executions of the pipeline.

from datazone import ContextType, transform

@transform
def my_transform(context: ContextType):
    context.state.write(key="my_key", value="my_value")


@transform(depends=[my_transform])
def my_transform(context: ContextType):
    value = context.state.read(key="my_key")
    print(value)

State Callbacks

You can write your states by various conditions like success, failure, now.

from datazone import ContextType, transform

@transform
def my_transform(context: ContextType):
    context.state.write(key="my_key", value="my_value", on="success")
    raise Exception("Error")

In this example, the state will not be written because the transform function raises an exception.

Resources

PySpark Session

You can access the PySpark session using the pyspark attribute of the resources object.

from datazone import ContextType, transform

@transform
def my_transform(context: ContextType):
    spark = context.resources["pyspark"].spark

    # Use the PySpark session
    df = spark.createDataFrame({
        "name": ["Alice", "Bob"],
        "age": [25, 30]
    })

    return df

Get Started

Reference

Deployment

Tutorial

State

State Callbacks

Resources

PySpark Session

Get Started

Reference

Deployment

Tutorial

​State

​State Callbacks

​Resources

​PySpark Session

State

State Callbacks

Resources

PySpark Session