DuckDB + PySpark (dagster-duckdb-pyspark)

This library provides an integration with the DuckDB database and PySpark data processing library.

class dagster_duckdb_pyspark.DuckDBPySparkTypeHandler(*args, **kwds)[source]

Stores PySpark DataFrames in DuckDB.

Note: This type handler can only store outputs. It cannot currently load inputs.

To use this type handler, pass it to build_duckdb_io_manager

Example

from dagster_duckdb import build_duckdb_io_manager
from dagster_duckdb_pyspark import DuckDBPySparkTypeHandler

duckdb_io_manager = build_duckdb_io_manager([DuckDBPySparkTypeHandler()])

@job(resource_defs={'io_manager': duckdb_io_manager})
def my_job():
    ...