Run Requests

class dagster.RunRequest(run_key=None, run_config=None, tags=None, job_name=None, asset_selection=None)[source]

Represents all the information required to launch a single run. Must be returned by a SensorDefinition or ScheduleDefinition’s evaluation function for a run to be launched.

To build a run request for a particular partitition, use run_request_for_partition().

run_key

A string key to identify this launched run. For sensors, ensures that only one run is created per run key across all sensor evaluations. For schedules, ensures that one run is created per tick, across failure recoveries. Passing in a None value means that a run will always be launched per evaluation.

Type:

Optional[str]

run_config

The config that parameterizes the run execution to be launched, as a dict.

Type:

Optional[Dict]

tags

A dictionary of tags (string key-value pairs) to attach to the launched run.

Type:

Optional[Dict[str, str]]

job_name

(Experimental) The name of the job this run request will launch. Required for sensors that target multiple jobs.

Type:

Optional[str]

asset_selection

A sequence of AssetKeys that should be launched with this run.

Type:

Optional[Sequence[AssetKey]]

class dagster.SkipReason(skip_message=None)[source]

Represents a skipped evaluation, where no runs are requested. May contain a message to indicate why no runs were requested.

skip_message

A message displayed in dagit for why this evaluation resulted in no requested runs.

Type:

Optional[str]

Schedules

@dagster.schedule(cron_schedule, *, job_name=None, name=None, tags=None, tags_fn=None, should_execute=None, environment_vars=None, execution_timezone=None, description=None, job=None, default_status=DefaultScheduleStatus.STOPPED)[source]

Creates a schedule following the provided cron schedule and requests runs for the provided job.

The decorated function takes in a ScheduleEvaluationContext as its only argument, and does one of the following:

  1. Return a RunRequest object.

  2. Return a list of RunRequest objects.

  3. Return a SkipReason object, providing a descriptive message of why no runs were requested.

  4. Return nothing (skipping without providing a reason)

  5. Return a run config dictionary.

  6. Yield a SkipReason or yield one ore more RunRequest objects.

Returns a ScheduleDefinition.

Parameters:
  • cron_schedule (Union[str, Sequence[str]]) – A valid cron string or sequence of cron strings specifying when the schedule will run, e.g., '45 23 * * 6' for a schedule that runs at 11:45 PM every Saturday. If a sequence is provided, then the schedule will run for the union of all execution times for the provided cron strings, e.g., ['45 23 * * 6', '30 9 * * 0] for a schedule that runs at 11:45 PM every Saturday and 9:30 AM every Sunday.

  • name (Optional[str]) – The name of the schedule to create.

  • tags (Optional[Dict[str, str]]) – A dictionary of tags (string key-value pairs) to attach to the scheduled runs.

  • tags_fn (Optional[Callable[[ScheduleEvaluationContext], Optional[Dict[str, str]]]]) – A function that generates tags to attach to the schedules runs. Takes a ScheduleEvaluationContext and returns a dictionary of tags (string key-value pairs). You may set only one of tags and tags_fn.

  • should_execute (Optional[Callable[[ScheduleEvaluationContext], bool]]) – A function that runs at schedule execution time to determine whether a schedule should execute or skip. Takes a ScheduleEvaluationContext and returns a boolean (True if the schedule should execute). Defaults to a function that always returns True.

  • environment_vars (Optional[Dict[str, str]]) – Any environment variables to set when executing the schedule.

  • execution_timezone (Optional[str]) – Timezone in which the schedule should run. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.

  • description (Optional[str]) – A human-readable description of the schedule.

  • job (Optional[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]) – The job that should execute when this schedule runs.

  • default_status (DefaultScheduleStatus) – Whether the schedule starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

class dagster.ScheduleDefinition(name=None, *, cron_schedule=None, job_name=None, run_config=None, run_config_fn=None, tags=None, tags_fn=None, should_execute=None, environment_vars=None, execution_timezone=None, execution_fn=None, description=None, job=None, default_status=DefaultScheduleStatus.STOPPED)[source]

Define a schedule that targets a job

Parameters:
  • name (Optional[str]) – The name of the schedule to create. Defaults to the job name plus “_schedule”.

  • cron_schedule (Union[str, Sequence[str]]) – A valid cron string or sequence of cron strings specifying when the schedule will run, e.g., '45 23 * * 6' for a schedule that runs at 11:45 PM every Saturday. If a sequence is provided, then the schedule will run for the union of all execution times for the provided cron strings, e.g., ['45 23 * * 6', '30 9 * * 0] for a schedule that runs at 11:45 PM every Saturday and 9:30 AM every Sunday.

  • execution_fn (Callable[ScheduleEvaluationContext]) –

    The core evaluation function for the schedule, which is run at an interval to determine whether a run should be launched or not. Takes a ScheduleEvaluationContext.

    This function must return a generator, which must yield either a single SkipReason or one or more RunRequest objects.

  • run_config (Optional[Mapping]) – The config that parameterizes this execution, as a dict.

  • run_config_fn (Optional[Callable[[ScheduleEvaluationContext], [Mapping]]]) – A function that takes a ScheduleEvaluationContext object and returns the run configuration that parameterizes this execution, as a dict. You may set only one of run_config, run_config_fn, and execution_fn.

  • tags (Optional[Mapping[str, str]]) – A dictionary of tags (string key-value pairs) to attach to the scheduled runs.

  • tags_fn (Optional[Callable[[ScheduleEvaluationContext], Optional[Mapping[str, str]]]]) – A function that generates tags to attach to the schedules runs. Takes a ScheduleEvaluationContext and returns a dictionary of tags (string key-value pairs). You may set only one of tags, tags_fn, and execution_fn.

  • should_execute (Optional[Callable[[ScheduleEvaluationContext], bool]]) – A function that runs at schedule execution time to determine whether a schedule should execute or skip. Takes a ScheduleEvaluationContext and returns a boolean (True if the schedule should execute). Defaults to a function that always returns True.

  • environment_vars (Optional[dict[str, str]]) – The environment variables to set for the schedule

  • execution_timezone (Optional[str]) – Timezone in which the schedule should run. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.

  • description (Optional[str]) – A human-readable description of the schedule.

  • job (Optional[Union[GraphDefinition, JobDefinition]]) – The job that should execute when this schedule runs.

  • default_status (DefaultScheduleStatus) – Whether the schedule starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

class dagster.ScheduleEvaluationContext(instance_ref, scheduled_execution_time)[source]

The context object available as the first argument various functions defined on a dagster.ScheduleDefinition.

A ScheduleEvaluationContext object is passed as the first argument to run_config_fn, tags_fn, and should_execute.

Users should not instantiate this object directly. To construct a ScheduleEvaluationContext for testing purposes, use dagster.build_schedule_context().

instance_ref

The serialized instance configured to run the schedule

Type:

Optional[InstanceRef]

scheduled_execution_time

The time in which the execution was scheduled to happen. May differ slightly from both the actual execution time and the time at which the run config is computed. Not available in all schedulers - currently only set in deployments using DagsterDaemonScheduler.

Type:

datetime

Example:

from dagster import schedule, ScheduleEvaluationContext

@schedule
def the_schedule(context: ScheduleEvaluationContext):
    ...
dagster.build_schedule_context(instance=None, scheduled_execution_time=None)[source]

Builds schedule execution context using the provided parameters.

The instance provided to build_schedule_context must be persistent; DagsterInstance.ephemeral() will result in an error.

Parameters:
  • instance (Optional[DagsterInstance]) – The dagster instance configured to run the schedule.

  • scheduled_execution_time (datetime) – The time in which the execution was scheduled to happen. May differ slightly from both the actual execution time and the time at which the run config is computed.

Examples

context = build_schedule_context(instance)
daily_schedule.evaluate_tick(context)
dagster._core.scheduler.DagsterDaemonScheduler Scheduler[source]

Config Schema:
max_catchup_runs (dagster.IntSource, optional):

For partitioned schedules, controls the maximum number of past partitions for each schedule that will be considered when looking for missing runs . Generally this parameter will only come into play if the scheduler falls behind or launches after experiencing downtime. This parameter will not be checked for schedules without partition sets (for example, schedules created using the @schedule decorator) - only the most recent execution time will be considered for those schedules.

Note that no matter what this value is, the scheduler will never launch a run from a time before the schedule was turned on (even if the start_date on the schedule is earlier) - if you want to launch runs for earlier partitions, launch a backfill.

Default Value: 5

max_tick_retries (dagster.IntSource, optional):

For each schedule tick that raises an error, how many times to retry that tick

Default Value: 0

Default scheduler implementation that submits runs from the dagster-daemon long-lived process. Periodically checks each running schedule for execution times that don’t have runs yet and launches them.

Partitioned Schedules

dagster.build_schedule_from_partitioned_job(job, description=None, name=None, minute_of_hour=None, hour_of_day=None, day_of_week=None, day_of_month=None, default_status=DefaultScheduleStatus.STOPPED, tags=None)[source]

Creates a schedule from a time window-partitioned job.

The schedule executes at the cadence specified by the partitioning of the given job.

class dagster.PartitionScheduleDefinition(name, cron_schedule, pipeline_name, tags_fn, should_execute, partition_set, environment_vars=None, run_config_fn=None, execution_timezone=None, execution_fn=None, description=None, decorated_fn=None, job=None, default_status=DefaultScheduleStatus.STOPPED)[source]
@dagster.hourly_partitioned_config(start_date, minute_offset=0, timezone=None, fmt=None, end_offset=0, tags_for_partition_fn=None)[source]

Defines run config over a set of hourly partitions.

The decorated function should accept a start datetime and end datetime, which represent the date partition the config should delineate.

The decorated function should return a run config dictionary.

The resulting object created by this decorator can be provided to the config argument of a Job. The first partition in the set will start at the start_date at midnight. The last partition in the set will end before the current time, unless the end_offset argument is set to a positive number. If minute_offset is provided, the start and end times of each partition will be minute_offset past the hour.

Parameters:
  • start_date (Union[datetime.datetime, str]) – The first date in the set of partitions. Can provide in either a datetime or string format.

  • minute_offset (int) – Number of minutes past the hour to “split” the partition. Defaults to 0.

  • fmt (Optional[str]) – The date format to use. Defaults to %Y-%m-%d.

  • timezone (Optional[str]) – The timezone in which each date should exist. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.

  • end_offset (int) – Extends the partition set by a number of partitions equal to the value passed. If end_offset is 0 (the default), the last partition ends before the current time. If end_offset is 1, the second-to-last partition ends before the current time, and so on.

@hourly_partitioned_config(start_date=datetime(2022, 03, 12))
# creates partitions (2022-03-12-00:00, 2022-03-12-01:00), (2022-03-12-01:00, 2022-03-12-02:00), ...

@hourly_partitioned_config(start_date=datetime(2022, 03, 12), minute_offset=15)
# creates partitions (2022-03-12-00:15, 2022-03-12-01:15), (2022-03-12-01:15, 2022-03-12-02:15), ...
@dagster.daily_partitioned_config(start_date, minute_offset=0, hour_offset=0, timezone=None, fmt=None, end_offset=0, tags_for_partition_fn=None)[source]

Defines run config over a set of daily partitions.

The decorated function should accept a start datetime and end datetime, which represent the bounds of the date partition the config should delineate.

The decorated function should return a run config dictionary.

The resulting object created by this decorator can be provided to the config argument of a Job. The first partition in the set will start at the start_date at midnight. The last partition in the set will end before the current time, unless the end_offset argument is set to a positive number. If minute_offset and/or hour_offset are used, the start and end times of each partition will be hour_offset:minute_offset of each day.

Parameters:
  • start_date (Union[datetime.datetime, str]) – The first date in the set of partitions. Can provide in either a datetime or string format.

  • minute_offset (int) – Number of minutes past the hour to “split” the partition. Defaults to 0.

  • hour_offset (int) – Number of hours past 00:00 to “split” the partition. Defaults to 0.

  • timezone (Optional[str]) – The timezone in which each date should exist. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.

  • fmt (Optional[str]) – The date format to use. Defaults to %Y-%m-%d.

  • end_offset (int) – Extends the partition set by a number of partitions equal to the value passed. If end_offset is 0 (the default), the last partition ends before the current time. If end_offset is 1, the second-to-last partition ends before the current time, and so on.

@daily_partitioned_config(start_date="2022-03-12")
# creates partitions (2022-03-12-00:00, 2022-03-13-00:00), (2022-03-13-00:00, 2022-03-14-00:00), ...

@daily_partitioned_config(start_date="2022-03-12", minute_offset=15, hour_offset=16)
# creates partitions (2022-03-12-16:15, 2022-03-13-16:15), (2022-03-13-16:15, 2022-03-14-16:15), ...
@dagster.weekly_partitioned_config(start_date, minute_offset=0, hour_offset=0, day_offset=0, timezone=None, fmt=None, end_offset=0, tags_for_partition_fn=None)[source]

Defines run config over a set of weekly partitions.

The decorated function should accept a start datetime and end datetime, which represent the date partition the config should delineate.

The decorated function should return a run config dictionary.

The resulting object created by this decorator can be provided to the config argument of a Job. The first partition in the set will start at the start_date. The last partition in the set will end before the current time, unless the end_offset argument is set to a positive number. If day_offset is provided, the start and end date of each partition will be day of the week corresponding to day_offset (0 indexed with Sunday as the start of the week). If minute_offset and/or hour_offset are used, the start and end times of each partition will be hour_offset:minute_offset of each day.

Parameters:
  • start_date (Union[datetime.datetime, str]) – The first date in the set of partitions will Sunday at midnight following start_date. Can provide in either a datetime or string format.

  • minute_offset (int) – Number of minutes past the hour to “split” the partition. Defaults to 0.

  • hour_offset (int) – Number of hours past 00:00 to “split” the partition. Defaults to 0.

  • day_offset (int) – Day of the week to “split” the partition. Defaults to 0 (Sunday).

  • timezone (Optional[str]) – The timezone in which each date should exist. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.

  • fmt (Optional[str]) – The date format to use. Defaults to %Y-%m-%d.

  • end_offset (int) – Extends the partition set by a number of partitions equal to the value passed. If end_offset is 0 (the default), the last partition ends before the current time. If end_offset is 1, the second-to-last partition ends before the current time, and so on.

@weekly_partitioned_config(start_date="2022-03-12")
# creates partitions (2022-03-13-00:00, 2022-03-20-00:00), (2022-03-20-00:00, 2022-03-27-00:00), ...

@weekly_partitioned_config(start_date="2022-03-12", minute_offset=15, hour_offset=3, day_offset=6)
# creates partitions (2022-03-12-03:15, 2022-03-19-03:15), (2022-03-19-03:15, 2022-03-26-03:15), ...
@dagster.monthly_partitioned_config(start_date, minute_offset=0, hour_offset=0, day_offset=1, timezone=None, fmt=None, end_offset=0, tags_for_partition_fn=None)[source]

Defines run config over a set of monthly partitions.

The decorated function should accept a start datetime and end datetime, which represent the date partition the config should delineate.

The decorated function should return a run config dictionary.

The resulting object created by this decorator can be provided to the config argument of a Job. The first partition in the set will start at midnight on the soonest first of the month after start_date. The last partition in the set will end before the current time, unless the end_offset argument is set to a positive number. If day_offset is provided, the start and end date of each partition will be day_offset. If minute_offset and/or hour_offset are used, the start and end times of each partition will be hour_offset:minute_offset of each day.

Parameters:
  • start_date (Union[datetime.datetime, str]) – The first date in the set of partitions will be midnight the sonnest first of the month following start_date. Can provide in either a datetime or string format.

  • minute_offset (int) – Number of minutes past the hour to “split” the partition. Defaults to 0.

  • hour_offset (int) – Number of hours past 00:00 to “split” the partition. Defaults to 0.

  • day_offset (int) – Day of the month to “split” the partition. Defaults to 1.

  • timezone (Optional[str]) – The timezone in which each date should exist. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.

  • fmt (Optional[str]) – The date format to use. Defaults to %Y-%m-%d.

  • end_offset (int) – Extends the partition set by a number of partitions equal to the value passed. If end_offset is 0 (the default), the last partition ends before the current time. If end_offset is 1, the second-to-last partition ends before the current time, and so on.

@monthly_partitioned_config(start_date="2022-03-12")
# creates partitions (2022-04-01-00:00, 2022-05-01-00:00), (2022-05-01-00:00, 2022-06-01-00:00), ...

@monthly_partitioned_config(start_date="2022-03-12", minute_offset=15, hour_offset=3, day_offset=5)
# creates partitions (2022-04-05-03:15, 2022-05-05-03:15), (2022-05-05-03:15, 2022-06-05-03:15), ...

Sensors

@dagster.sensor(job_name=None, *, name=None, minimum_interval_seconds=None, description=None, job=None, jobs=None, default_status=DefaultSensorStatus.STOPPED)[source]

Creates a sensor where the decorated function is used as the sensor’s evaluation function. The decorated function may:

  1. Return a RunRequest object.

  2. Return a list of RunRequest objects.

  3. Return a SkipReason object, providing a descriptive message of why no runs were requested.

  4. Return nothing (skipping without providing a reason)

  5. Yield a SkipReason or yield one or more RunRequest objects.

Takes a SensorEvaluationContext.

Parameters:
  • name (Optional[str]) – The name of the sensor. Defaults to the name of the decorated function.

  • minimum_interval_seconds (Optional[int]) – The minimum number of seconds that will elapse between sensor evaluations.

  • description (Optional[str]) – A human-readable description of the sensor.

  • job (Optional[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]) – The job to be executed when the sensor fires.

  • jobs (Optional[Sequence[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]]) – (experimental) A list of jobs to be executed when the sensor fires.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

class dagster.SensorDefinition(name=None, *, evaluation_fn=None, job_name=None, minimum_interval_seconds=None, description=None, job=None, jobs=None, default_status=DefaultSensorStatus.STOPPED)[source]

Define a sensor that initiates a set of runs based on some external state

Parameters:
  • evaluation_fn (Callable[[SensorEvaluationContext]]) –

    The core evaluation function for the sensor, which is run at an interval to determine whether a run should be launched or not. Takes a SensorEvaluationContext.

    This function must return a generator, which must yield either a single SkipReason or one or more RunRequest objects.

  • name (Optional[str]) – The name of the sensor to create. Defaults to name of evaluation_fn

  • minimum_interval_seconds (Optional[int]) – The minimum number of seconds that will elapse between sensor evaluations.

  • description (Optional[str]) – A human-readable description of the sensor.

  • job (Optional[GraphDefinition, JobDefinition]) – The job to execute when this sensor fires.

  • jobs (Optional[Sequence[GraphDefinition, JobDefinition]]) – (experimental) A list of jobs to execute when this sensor fires.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

class dagster.SensorEvaluationContext(instance_ref, last_completion_time, last_run_key, cursor, repository_name, instance=None)[source]

The context object available as the argument to the evaluation function of a dagster.SensorDefinition.

Users should not instantiate this object directly. To construct a SensorEvaluationContext for testing purposes, use dagster. build_sensor_context().

instance_ref

The serialized instance configured to run the schedule

Type:

Optional[InstanceRef]

cursor

The cursor, passed back from the last sensor evaluation via the cursor attribute of SkipReason and RunRequest

Type:

Optional[str]

last_completion_time

DEPRECATED The last time that the sensor was evaluated (UTC).

Type:

float

last_run_key

DEPRECATED The run key of the RunRequest most recently created by this sensor. Use the preferred cursor attribute instead.

Type:

str

repository_name

The name of the repository that the sensor belongs to.

Type:

Optional[str]

instance

The deserialized instance can also be passed in directly (primarily useful in testing contexts).

Type:

Optional[DagsterInstance]

Example:

from dagster import sensor, SensorEvaluationContext

@sensor
def the_sensor(context: SensorEvaluationContext):
    ...
property cursor

The cursor value for this sensor, which was set in an earlier sensor evaluation.

update_cursor(cursor)[source]

Updates the cursor value for this sensor, which will be provided on the context for the next sensor evaluation.

This can be used to keep track of progress and avoid duplicate work across sensor evaluations.

Parameters:

cursor (Optional[str]) –

class dagster.MultiAssetSensorEvaluationContext(*args, **kwargs)[source]

The context object available as the argument to the evaluation function of a dagster.MultiAssetSensorDefinition.

Users should not instantiate this object directly. To construct a MultiAssetSensorEvaluationContext for testing purposes, use dagster. build_multi_asset_sensor_context().

The MultiAssetSensorEvaluationContext contains a cursor object that tracks the state of consumed event logs for each monitored asset. For each asset, the cursor stores the storage ID of the latest materialization that has been marked as “consumed” (via a call to advance_cursor) in a latest_consumed_event_id field.

For each monitored asset, the cursor will store the latest unconsumed event ID for up to 25 partitions. Each event ID must be before the latest_consumed_event_id field for the asset.

Events marked as consumed via advance_cursor will be returned in future ticks until they are marked as consumed.

To update the cursor to the latest materialization and clear the unconsumed events, call advance_all_cursors.

asset_keys

The asset keys that the sensor is configured to monitor.

Type:

Sequence[AssetKey]

repository_def

The repository that the sensor belongs to.

Type:

RepositoryDefinition

instance_ref

The serialized instance configured to run the schedule

Type:

Optional[InstanceRef]

cursor

The cursor, passed back from the last sensor evaluation via the cursor attribute of SkipReason and RunRequest. Must be a dictionary of asset key strings to a stringified tuple of (latest_event_partition, latest_event_storage_id, trailing_unconsumed_partitioned_event_ids).

Type:

Optional[str]

last_completion_time

DEPRECATED The last time that the sensor was consumed (UTC).

Type:

float

last_run_key

DEPRECATED The run key of the RunRequest most recently created by this sensor. Use the preferred cursor attribute instead.

Type:

str

repository_name

The name of the repository that the sensor belongs to.

Type:

Optional[str]

instance

The deserialized instance can also be passed in directly (primarily useful in testing contexts).

Type:

Optional[DagsterInstance]

Example:

from dagster import multi_asset_sensor, MultiAssetSensorEvaluationContext

@multi_asset_sensor(asset_keys=[AssetKey("asset_1), AssetKey("asset_2)])
def the_sensor(context: MultiAssetSensorEvaluationContext):
    ...
advance_all_cursors()[source]

Updates the cursor to the most recent materialization event for all assets monitored by the multi_asset_sensor.

Marks all materialization events as consumed by the sensor, including unconsumed events.

advance_cursor(materialization_records_by_key)[source]

Marks the provided materialization records as having been consumed by the sensor.

At the end of the tick, the cursor will be updated to advance past all materializations records provided via advance_cursor. In the next tick, records that have been consumed will no longer be returned.

Passing a partitioned materialization record into this function will mark prior materializations with the same asset key and partition as having been consumed.

Parameters:

materialization_records_by_key (Mapping[AssetKey, Optional[EventLogRecord]]) – Mapping of AssetKeys to EventLogRecord or None. If an EventLogRecord is provided, the cursor for the AssetKey will be updated and future calls to fetch asset materialization events will not fetch this event again. If None is provided, the cursor for the AssetKey will not be updated.

all_partitions_materialized(asset_key, partitions=None)[source]

A utility method to check if a provided list of partitions have been materialized for a particular asset. This method ignores the cursor and checks all materializations for the asset.

Parameters:
  • asset_key (AssetKey) – The asset to check partitions for.

  • partitions (Optional[Sequence[str]]) – A list of partitions to check. If not provided, all partitions for the asset will be checked.

Returns:

True if all selected partitions have been materialized, False otherwise.

Return type:

bool

get_cursor_partition(asset_key)[source]

A utility method to get the current partition the cursor is on.

get_downstream_partition_keys(partition_key, from_asset_key, to_asset_key)[source]

Converts a partition key from one asset to the corresponding partition key in a downstream asset. Uses the existing partition mapping between the upstream asset and the downstream asset if it exists, otherwise, uses the default partition mapping.

Parameters:
  • partition_key (str) – The partition key to convert.

  • from_asset_key (AssetKey) – The asset key of the upstream asset, which the provided partition key belongs to.

  • to_asset_key (AssetKey) – The asset key of the downstream asset. The provided partition key will be mapped to partitions within this asset.

Returns:

A list of the corresponding downstream partitions in to_asset_key that

partition_key maps to.

Return type:

Sequence[str]

get_trailing_unconsumed_events(asset_key)[source]

Fetches the unconsumed events for a given asset key. Returns only events before the latest consumed event ID for the given asset. To mark an event as consumed, pass the event to advance_cursor. Returns events in ascending order by storage ID.

Parameters:

asset_key (AssetKey) – The asset key to get unconsumed events for.

Returns:

The unconsumed events for the given asset key.

Return type:

Sequence[EventLogRecord]

latest_materialization_records_by_key(asset_keys=None)[source]

Fetches the most recent materialization event record for each asset in asset_keys. Only fetches events after the latest consumed event ID for the given asset key.

Parameters:

asset_keys (Optional[Sequence[AssetKey]]) – list of asset keys to fetch events for. If not specified, the latest materialization will be fetched for all assets the multi_asset_sensor monitors.

Returns: Mapping of AssetKey to EventLogRecord where the EventLogRecord is the latest

materialization event for the asset. If there is no materialization event for the asset, the value in the mapping will be None.

latest_materialization_records_by_partition(asset_key, after_cursor_partition=False)[source]

Given an asset, returns a mapping of partition key to the latest materialization event for that partition. Fetches only materializations that have not been marked as “consumed” via a call to advance_cursor.

Parameters:
  • asset_key (AssetKey) – The asset to fetch events for.

  • after_cursor_partition (Optional[bool]) – If True, only materializations with partitions after the cursor’s current partition will be returned. By default, set to False.

Returns:

Mapping of AssetKey to a mapping of partitions to EventLogRecords where the EventLogRecord is the most recent materialization event for the partition. The mapping preserves the order that the materializations occurred.

Return type:

Mapping[str, EventLogRecord]

Example

@asset(partitions_def=DailyPartitionsDefinition("2022-07-01"))
def july_asset():
    return 1

@multi_asset_sensor(asset_keys=[july_asset.key])
def my_sensor(context):
    context.latest_materialization_records_by_partition(july_asset.key)

# After materializing july_asset for 2022-07-05, latest_materialization_by_partition
# returns {"2022-07-05": EventLogRecord(...)}
latest_materialization_records_by_partition_and_asset()[source]

Finds the most recent unconsumed materialization for each partition for each asset monitored by the sensor. Aggregates all materializations into a mapping of partition key to a mapping of asset key to the materialization event for that partition.

For example, if the sensor monitors two partitioned assets A and B that are materialized for partition_x after the cursor, this function returns:

{
    "partition_x": {asset_a.key: EventLogRecord(...), asset_b.key: EventLogRecord(...)}
}

This method can only be called when all monitored assets are partitioned and share the same partition definition.

materialization_records_for_key(asset_key, limit)[source]

Fetches asset materialization event records for asset_key, with the earliest event first.

Only fetches events after the latest consumed event ID for the given asset key.

Parameters:
  • asset_key (AssetKey) – The asset to fetch materialization events for

  • limit (int) – The number of events to fetch

dagster.build_sensor_context(instance=None, cursor=None, repository_name=None)[source]

Builds sensor execution context using the provided parameters.

This function can be used to provide a context to the invocation of a sensor definition.If provided, the dagster instance must be persistent; DagsterInstance.ephemeral() will result in an error.

Parameters:
  • instance (Optional[DagsterInstance]) – The dagster instance configured to run the sensor.

  • cursor (Optional[str]) – A cursor value to provide to the evaluation of the sensor.

  • repository_name (Optional[str]) – The name of the repository that the sensor belongs to.

Examples

context = build_sensor_context()
my_sensor(context)
dagster.build_multi_asset_sensor_context(repository_def, asset_keys=None, asset_selection=None, instance=None, cursor=None, repository_name=None, cursor_from_latest_materializations=False)[source]

Builds multi asset sensor execution context for testing purposes using the provided parameters.

This function can be used to provide a context to the invocation of a multi asset sensor definition. If provided, the dagster instance must be persistent; DagsterInstance.ephemeral() will result in an error.

Parameters:
  • repository_def (RepositoryDefinition) – The repository definition that the sensor belongs to.

  • asset_keys (Optional[Sequence[AssetKey]]) – The list of asset keys monitored by the sensor. If not provided, asset_selection argument must be provided.

  • asset_selection (Optional[AssetSelection]) – The asset selection monitored by the sensor. If not provided, asset_keys argument must be provided.

  • instance (Optional[DagsterInstance]) – The dagster instance configured to run the sensor.

  • cursor (Optional[str]) – A string cursor to provide to the evaluation of the sensor. Must be a dictionary of asset key strings to ints that has been converted to a json string

  • repository_name (Optional[str]) – The name of the repository that the sensor belongs to.

  • cursor_from_latest_materializations (bool) – If True, the cursor will be set to the latest materialization for each monitored asset. By default, set to False.

Examples

with instance_for_test() as instance:
    context = build_multi_asset_sensor_context(asset_keys=[AssetKey("asset_1"), AssetKey("asset_2")], instance=instance)
    my_asset_sensor(context)
class dagster.AssetSensorDefinition(name, asset_key, job_name, asset_materialization_fn, minimum_interval_seconds=None, description=None, job=None, jobs=None, default_status=DefaultSensorStatus.STOPPED)[source]

Define an asset sensor that initiates a set of runs based on the materialization of a given asset.

Parameters:
  • name (str) – The name of the sensor to create.

  • asset_key (AssetKey) – The asset_key this sensor monitors.

  • asset_materialization_fn (Callable[[SensorEvaluationContext, EventLogEntry], Union[Iterator[Union[RunRequest, SkipReason]], RunRequest, SkipReason]]) –

    The core evaluation function for the sensor, which is run at an interval to determine whether a run should be launched or not. Takes a SensorEvaluationContext and an EventLogEntry corresponding to an AssetMaterialization event.

    This function must return a generator, which must yield either a single SkipReason or one or more RunRequest objects.

  • minimum_interval_seconds (Optional[int]) – The minimum number of seconds that will elapse between sensor evaluations.

  • description (Optional[str]) – A human-readable description of the sensor.

  • job (Optional[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]) – The job object to target with this sensor.

  • jobs (Optional[Sequence[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]]) – (experimental) A list of jobs to be executed when the sensor fires.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

@dagster.asset_sensor(asset_key, *, job_name=None, name=None, minimum_interval_seconds=None, description=None, job=None, jobs=None, default_status=DefaultSensorStatus.STOPPED)[source]

Creates an asset sensor where the decorated function is used as the asset sensor’s evaluation function. The decorated function may:

  1. Return a RunRequest object.

  2. Return a list of RunRequest objects.

  3. Return a SkipReason object, providing a descriptive message of why no runs were requested.

  4. Return nothing (skipping without providing a reason)

  5. Yield a SkipReason or yield one or more RunRequest objects.

Takes a SensorEvaluationContext and an EventLogEntry corresponding to an AssetMaterialization event.

Parameters:
  • asset_key (AssetKey) – The asset_key this sensor monitors.

  • name (Optional[str]) – The name of the sensor. Defaults to the name of the decorated function.

  • minimum_interval_seconds (Optional[int]) – The minimum number of seconds that will elapse between sensor evaluations.

  • description (Optional[str]) – A human-readable description of the sensor.

  • job (Optional[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]) – The job to be executed when the sensor fires.

  • jobs (Optional[Sequence[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]]) – (experimental) A list of jobs to be executed when the sensor fires.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

Example

from dagster import AssetKey, EventLogEntry, SensorEvaluationContext, asset_sensor


@asset_sensor(asset_key=AssetKey("my_table"), job=my_job)
def my_asset_sensor(context: SensorEvaluationContext, asset_event: EventLogEntry):
    return RunRequest(
        run_key=context.cursor,
        run_config={
            "ops": {
                "read_materialization": {
                    "config": {
                        "asset_key": asset_event.dagster_event.asset_key.path,
                    }
                }
            }
        },
    )
class dagster.MultiAssetSensorDefinition(*args, **kwargs)[source]

Define an asset sensor that initiates a set of runs based on the materialization of a list of assets.

Users should not instantiate this object directly. To construct a MultiAssetSensorDefinition, use dagster. multi_asset_sensor().

Parameters:
  • name (str) – The name of the sensor to create.

  • asset_keys (Sequence[AssetKey]) – The asset_keys this sensor monitors.

  • asset_materialization_fn (Callable[[MultiAssetSensorEvaluationContext], Union[Iterator[Union[RunRequest, SkipReason]], RunRequest, SkipReason]]) –

    The core evaluation function for the sensor, which is run at an interval to determine whether a run should be launched or not. Takes a MultiAssetSensorEvaluationContext.

    This function must return a generator, which must yield either a single SkipReason or one or more RunRequest objects.

  • minimum_interval_seconds (Optional[int]) – The minimum number of seconds that will elapse between sensor evaluations.

  • description (Optional[str]) – A human-readable description of the sensor.

  • job (Optional[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]) – The job object to target with this sensor.

  • jobs (Optional[Sequence[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]]) – (experimental) A list of jobs to be executed when the sensor fires.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

@dagster.multi_asset_sensor(asset_keys=None, asset_selection=None, *, job_name=None, name=None, minimum_interval_seconds=None, description=None, job=None, jobs=None, default_status=DefaultSensorStatus.STOPPED)[source]

Creates an asset sensor that can monitor multiple assets

The decorated function is used as the asset sensor’s evaluation function. The decorated function may:

  1. Return a RunRequest object.

  2. Return a list of RunRequest objects.

  3. Return a SkipReason object, providing a descriptive message of why no runs were requested.

  4. Return nothing (skipping without providing a reason)

  5. Yield a SkipReason or yield one ore more RunRequest objects.

Takes a MultiAssetSensorEvaluationContext.

Parameters:
  • asset_keys (Optional[Sequence[AssetKey]]) – The asset keys this sensor monitors. If not provided, asset_selection argument must be provided. To monitor assets that aren’t defined in the repository that this sensor is part of, you must use asset_keys.

  • asset_selection (Optional[AssetSelection]) – The asset selection this sensor monitors. If not provided, asset_keys argument must be provided. If you use asset_selection, all assets that are part of the selection must be in the repository that this sensor is part of.

  • name (Optional[str]) – The name of the sensor. Defaults to the name of the decorated function.

  • minimum_interval_seconds (Optional[int]) – The minimum number of seconds that will elapse between sensor evaluations.

  • description (Optional[str]) – A human-readable description of the sensor.

  • job (Optional[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]) – The job to be executed when the sensor fires.

  • jobs (Optional[Sequence[Union[GraphDefinition, JobDefinition, UnresolvedAssetJobDefinition]]]) – (experimental) A list of jobs to be executed when the sensor fires.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

dagster.build_asset_reconciliation_sensor(asset_selection, name, wait_for_all_upstream=False, wait_for_in_progress_runs=True, minimum_interval_seconds=None, description=None, default_status=DefaultSensorStatus.STOPPED)[source]

Constructs a sensor that will monitor the parents of the provided assets and materialize an asset based on the materialization of its parents. This will keep the monitored assets up to date with the latest data available to them. The sensor defaults to materializing an asset when all of its parents have materialized, but it can be set to materialize an asset when any of its parents have materialized.

Note: Currently, this sensor only works for non-partitioned assets.

Parameters:
  • asset_selection (AssetSelection) – The group of assets you want to keep up-to-date

  • name (str) – The name to give the sensor.

  • wait_for_all_upstream (bool) – If True, the sensor will only materialize an asset when all of its parents have materialized. If False, the sensor will materialize an asset when any of its parents have materialized. Defaults to False.

  • wait_for_in_progress_runs (bool) – If True, the sensor will not materialize an asset if there is an in-progress run that will materialize any of the asset’s parents. Defaults to True.

  • minimum_interval_seconds (Optional[int]) – The minimum amount of time that should elapse between sensor invocations.

  • description (Optional[str]) – A description for the sensor.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

Returns:

A MultiAssetSensorDefinition that will monitor the parents of the provided assets to determine when the provided assets should be materialized

Example

If you have the following asset graph:

a       b       c
 \     / \     /
    d       e
     \     /
        f

and create the sensor:

build_asset_reconciliation_sensor(
    AssetSelection.assets(d, e, f),
    name="my_reconciliation_sensor",
    wait_for_all_upstream=True,
    wait_for_in_progress_runs=True
)
You will observe the following behavior:
  • If a, b, and c are all materialized, then on the next sensor tick, the sensor will see that d and e can be materialized. Since d and e will be materialized, f can also be materialized. The sensor will kick off a run that will materialize d, e, and f.

  • If on the next sensor tick, a, b, and c have not been materialized again the sensor will not launch a run.

  • If before the next sensor tick, just asset a and b have been materialized, the sensor will launch a run to materialize d.

  • If asset c is materialized by the next sensor tick, the sensor will see that e can be materialized (since b and c have both been materialized since the last materialization of e). The sensor will also see that f can be materialized since d was updated in the previous sensor tick and e will be materialized by the sensor. The sensor will launch a run the materialize e and f.

  • If by the next sensor tick, only asset b has been materialized. The sensor will not launch a run since d and e both have a parent that has not been updated.

  • If during the next sensor tick, there is a materialization of a in progress, the sensor will not launch a run to materialize d. Once a has completed materialization, the next sensor tick will launch a run to materialize d.

Other considerations:

If an asset has a SourceAsset as a parent, and that source asset points to an external data source (ie the source asset does not point to an asset in another repository), the sensor will not know when to consider the source asset “materialized”. If you have the asset graph:

x   external_data_source
 \       /
     y

and create the sensor:

build_asset_reconciliation_sensor(
    AssetSelection.assets(y),
    name="my_reconciliation_sensor",
    wait_for_all_upstream=True,
    wait_for_in_progress_runs=True
)

y will never be updated because external_data_source is never considered “materialized. In this case you should create the sensor

build_asset_reconciliation_sensor(
    AssetSelection.assets(y),
    name="my_reconciliation_sensor",
    wait_for_all_upstream=False,
    wait_for_in_progress_runs=True
)

which will cause y to be materialized when x is materialized.

class dagster.RunStatusSensorDefinition(name, run_status, run_status_sensor_fn, minimum_interval_seconds=None, description=None, monitored_jobs=None, monitor_all_repositories=False, default_status=DefaultSensorStatus.STOPPED, request_job=None, request_jobs=None)[source]

Define a sensor that reacts to a given status of pipeline execution, where the decorated function will be evaluated when a run is at the given status.

Parameters:
  • name (str) – The name of the sensor. Defaults to the name of the decorated function.

  • run_status (DagsterRunStatus) – The status of a run which will be monitored by the sensor.

  • run_status_sensor_fn (Callable[[RunStatusSensorContext], Union[SkipReason, PipelineRunReaction]]) – The core evaluation function for the sensor. Takes a RunStatusSensorContext.

  • minimum_interval_seconds (Optional[int]) – The minimum number of seconds that will elapse between sensor evaluations.

  • description (Optional[str]) – A human-readable description of the sensor.

  • monitored_jobs (Optional[List[Union[JobDefinition, GraphDefinition, UnresolvedAssetJobDefinition, JobSelector, RepositorySelector]]]) – The jobs in the current repository that will be monitored by this sensor. Defaults to None, which means the alert will be sent when any job in the repository fails.

  • monitor_all_repositories (bool) – If set to True, the sensor will monitor all runs in the Dagster instance. If set to True, an error will be raised if you also specify monitored_jobs or job_selection. Defaults to False.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

  • request_job (Optional[Union[GraphDefinition, JobDefinition]]) – The job a RunRequest should execute if yielded from the sensor.

  • request_jobs (Optional[Sequence[Union[GraphDefinition, JobDefinition]]]) – (experimental) A list of jobs to be executed if RunRequests are yielded from the sensor.

class dagster.RunStatusSensorContext(sensor_name, dagster_run, dagster_event, instance)[source]

The context object available to a decorated function of run_status_sensor.

sensor_name

the name of the sensor.

Type:

str

dagster_run

the run of the job or pipeline.

Type:

DagsterRun

dagster_event

the event associated with the job or pipeline run status.

Type:

DagsterEvent

instance

the current instance.

Type:

DagsterInstance

class dagster.RunFailureSensorContext(sensor_name, dagster_run, dagster_event, instance)[source]

The context object available to a decorated function of run_failure_sensor.

sensor_name

the name of the sensor.

Type:

str

dagster_run

the failed pipeline run.

Type:

DagsterRun

failure_event

the pipeline failure event.

Type:

DagsterEvent

class dagster.JobSelector(location_name, repository_name, job_name)[source]
class dagster.RepositorySelector(location_name, repository_name)[source]
dagster.build_run_status_sensor_context(sensor_name, dagster_event, dagster_instance, dagster_run)[source]

Builds run status sensor context from provided parameters.

This function can be used to provide the context argument when directly invoking a function decorated with @run_status_sensor or @run_failure_sensor, such as when writing unit tests.

Parameters:
  • sensor_name (str) – The name of the sensor the context is being constructed for.

  • dagster_event (DagsterEvent) – A DagsterEvent with the same event type as the one that triggers the run_status_sensor

  • dagster_instance (DagsterInstance) – The dagster instance configured for the context.

  • dagster_run (DagsterRun) – DagsterRun object from running a job

Examples

instance = DagsterInstance.ephemeral()
result = my_job.execute_in_process(instance=instance)

dagster_run = result.dagster_run
dagster_event = result.get_job_success_event() # or get_job_failure_event()

context = build_run_status_sensor_context(
    sensor_name="run_status_sensor_to_invoke",
    dagster_instance=instance,
    dagster_run=dagster_run,
    dagster_event=dagster_event,
)
run_status_sensor_to_invoke(context)
@dagster.run_status_sensor(run_status, name=None, minimum_interval_seconds=None, description=None, monitored_jobs=None, job_selection=None, monitor_all_repositories=False, default_status=DefaultSensorStatus.STOPPED, request_job=None, request_jobs=None)[source]

Creates a sensor that reacts to a given status of pipeline execution, where the decorated function will be run when a pipeline is at the given status.

Takes a RunStatusSensorContext.

Parameters:
  • run_status (DagsterRunStatus) – The status of run execution which will be monitored by the sensor.

  • name (Optional[str]) – The name of the sensor. Defaults to the name of the decorated function.

  • minimum_interval_seconds (Optional[int]) – The minimum number of seconds that will elapse between sensor evaluations.

  • description (Optional[str]) – A human-readable description of the sensor.

  • monitored_jobs (Optional[List[Union[PipelineDefinition, GraphDefinition, UnresolvedAssetJobDefinition, RepositorySelector, JobSelector]]]) – Jobs in the current repository that will be monitored by this sensor. Defaults to None, which means the alert will be sent when any job in the repository matches the requested run_status. Jobs in external repositories can be monitored by using RepositorySelector or JobSelector.

  • monitor_all_repositories (bool) – If set to True, the sensor will monitor all runs in the Dagster instance. If set to True, an error will be raised if you also specify monitored_jobs or job_selection. Defaults to False.

  • job_selection (Optional[List[Union[PipelineDefinition, GraphDefinition, RepositorySelector, JobSelector]]]) – (deprecated in favor of monitored_jobs) Jobs in the current repository that will be monitored by this sensor. Defaults to None, which means the alert will be sent when any job in the repository matches the requested run_status.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

  • request_job (Optional[Union[GraphDefinition, JobDefinition]]) – The job that should be executed if a RunRequest is yielded from the sensor.

  • request_jobs (Optional[Sequence[Union[GraphDefinition, JobDefinition]]]) – (experimental) A list of jobs to be executed if RunRequests are yielded from the sensor.

@dagster.run_failure_sensor(name=None, minimum_interval_seconds=None, description=None, monitored_jobs=None, job_selection=None, monitor_all_repositories=False, default_status=DefaultSensorStatus.STOPPED, request_job=None, request_jobs=None)[source]

Creates a sensor that reacts to job failure events, where the decorated function will be run when a run fails.

Takes a RunFailureSensorContext.

Parameters:
  • name (Optional[str]) – The name of the job failure sensor. Defaults to the name of the decorated function.

  • minimum_interval_seconds (Optional[int]) – The minimum number of seconds that will elapse between sensor evaluations.

  • description (Optional[str]) – A human-readable description of the sensor.

  • monitored_jobs (Optional[List[Union[JobDefinition, GraphDefinition, UnresolvedAssetJobDefinition, RepositorySelector, JobSelector]]]) – The jobs in the current repository that will be monitored by this failure sensor. Defaults to None, which means the alert will be sent when any job in the current repository fails.

  • monitor_all_repositories (bool) – If set to True, the sensor will monitor all runs in the Dagster instance. If set to True, an error will be raised if you also specify monitored_jobs or job_selection. Defaults to False.

  • job_selection (Optional[List[Union[JobDefinition, GraphDefinition, RepositorySelector, JobSelector]]]) – (deprecated in favor of monitored_jobs) The jobs in the current repository that will be monitored by this failure sensor. Defaults to None, which means the alert will be sent when any job in the repository fails.

  • default_status (DefaultSensorStatus) – Whether the sensor starts as running or not. The default status can be overridden from Dagit or via the GraphQL API.

  • request_job (Optional[Union[GraphDefinition, JobDefinition]]) – The job a RunRequest should execute if yielded from the sensor.

  • request_jobs (Optional[Sequence[Union[GraphDefinition, JobDefinition]]]) – (experimental) A list of jobs to be executed if RunRequests are yielded from the sensor.