If you are running on AWS ECS or another container-based orchestration system, you'll likely want to package Dagit using a Docker image.
A minimal skeleton Dockerfile
that will run Dagit is shown below:
FROM python:3.7-slim RUN mkdir -p /opt/dagster/dagster_home /opt/dagster/app RUN pip install dagit dagster-postgres # Copy your code and workspace to /opt/dagster/app COPY repo.py workspace.yaml /opt/dagster/app/ ENV DAGSTER_HOME=/opt/dagster/dagster_home/ # Copy dagster instance YAML to $DAGSTER_HOME COPY dagster.yaml /opt/dagster/dagster_home/ WORKDIR /opt/dagster/app EXPOSE 3000 ENTRYPOINT ["dagit", "-h", "0.0.0.0", "-p", "3000"]
You'll also need to include a workspace.yaml
file in the same directory as the Dockerfile to configure your workspace:
load_from: # References the file copied into your Dockerfile - python_file: repo.py
as well as a dagster.yaml
file to configure your Dagster instance:
storage: postgres: postgres_db: username: env: DAGSTER_PG_USERNAME password: env: DAGSTER_PG_PASSWORD hostname: env: DAGSTER_PG_HOST db_name: env: DAGSTER_PG_DB port: 5432 compute_logs: module: dagster_aws.s3.compute_log_manager class: S3ComputeLogManager config: bucket: "mycorp-dagster-compute-logs" prefix: "dagster-test-" local_artifact_storage: module: dagster.core.storage.root class: LocalArtifactStorage config: base_dir: "/opt/dagster/local/"
In cases where you're using environment variables to configure the instance, you should ensure these environment variables are exposed in the running Dagit container.
Dagit servers expose a health check endpoint at /dagit_info
, which returns a JSON response like:
{ "dagit_version": "0.12.0", "dagster_graphql_version": "0.12.0", "dagster_version": "0.12.0" }
More advanced dagster deployments will require deploying more than one container. For example, if you are using dagster-daemon to run schedules and sensors or manage a queue of runs, you'll likely want a separate container running the dagster-daemon
service. This service must have access to your dagster.yaml
and workspace.yaml
files, just like the Dagit container. You can also configure your workspace so that your code can be updated and deployed separately in its own container running a gRPC server, without needing to redeploy the other dagster services. To enable this setup, include a container exposing a gRPC server at a port, and add that port in your workspace.yaml
file.
For example, your user code container might have the following Dockerfile:
FROM python:3.7-slim # Checkout and install dagster libraries needed to run the gRPC server # exposing your repository to dagit and dagster-daemon, and to load # the DagsterInstance RUN pip install \ dagster \ dagster-postgres \ dagster-docker # Set $DAGSTER_HOME and copy dagster instance there ENV DAGSTER_HOME=/opt/dagster/dagster_home RUN mkdir -p $DAGSTER_HOME COPY dagster.yaml $DAGSTER_HOME # Add repository code WORKDIR /opt/dagster/app COPY repo.py /opt/dagster/app # Run dagster gRPC server on port 4000 EXPOSE 4000 # Using CMD rather than ENTRYPOINT allows the command to be overridden in # run launchers or executors to run other commands using this image CMD ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4000", "-f", "repo.py"]
and your workspace might look like:
load_from: # Each entry here corresponds to a container that exposes a gRPC server. - grpc_server: host: docker_example_user_code port: 4000 location_name: "example_user_code"
When you update your code, you can rebuild and restart your user code container without needing to redeploy other parts of the system. Dagit will automatically notice that a new server has been redeployed and prompt you to refresh your workspace.
When you add or remove a user code container, you can also remove the corresponding entry from your workspace.yaml
file. If this file is mounted on the dagit and dagster-daemon containers as a volume, you can pick up the changes in Dagit by reloading the workspace from the Workspace tab. The dagster-daemon container will automatically pick up the changes by periodically reloading the workspace from the workspace.yaml
file.
To launch each run its own container, you can add the DockerRunLauncher
to your dagster.yaml
file:
run_launcher: module: dagster_docker class: DockerRunLauncher config: env_vars: - DAGSTER_POSTGRES_USER - DAGSTER_POSTGRES_PASSWORD - DAGSTER_POSTGRES_DB
This launcher will start each run in a new container, using whatever image that you set in the DAGSTER_CURRENT_IMAGE
environment variable in your user code container (which will usually be the same image as the user code container itself)
Any container that launches runs (usually the dagster-daemon
container if you are maintaining a run queue or launching runs from schedules or sensors) must have permissions to create Docker containers in order to use this run launcher (mounting /var/run/docker.sock
as a volume is one way to give it these permissions).
You can mount your code in your user code container so that you don't have to rebuild your container whenever your code changes. Even if you're using volume mounts, you still need to restart the container whenever your code changes.
If you are mounting your code as a volume in your user code container and using DockerRunLauncher
to launch each run in a new container, you must specify your volume mounts in the DockerRunLauncher
config as well. For example:
run_launcher: module: dagster_docker class: DockerRunLauncher config: env_vars: - DAGSTER_POSTGRES_USER - DAGSTER_POSTGRES_PASSWORD - DAGSTER_POSTGRES_DB container_kwargs: volumes: - /absolute/path/to/local/repo.py:/opt/dagster/app/
This example demonstrates a Dagster deployment using docker-compose
that includes a Dagit container for loading and launching jobs, a dagster-daemon
container for managing a run queue and submitting runs from schedules and sensors, a Postgres container for persistent storage, and a container with user code. The Dagster instance uses DockerRunLauncher
to launch each run in its own container.
To start the deployment, run docker-compose up
.