Simplifying the MLOps stack

Oct 04, 2022

TL;DR: We’re introducing dstack — an open-source command-line utility to provision and orchestrate ML infrastructure

Running ML workflows involves a lot of hurdles. If you used remote machines yourself for running ML workflows, you know how it is. You connect to a machine through SSH, install the CUDA driver, fetch your code, copy the data, run the script, watch the process, etc. Finally, if the machine is a cloud instance, stop it. If your environment setup is complex, you may need to build and use a Docker image.

In case you don’t want to manage it manually (and you don’t want to build your own internal platform to automate it), you may start looking at so-called MLOps platforms that aim to automate the entire process end-to-end. Because they cover it end-to-end, their abstractions eventually become leaky and become a source of complexity and inflexibility.

For example, if you write your code in Python scripts, use Git to version it, and prefer working within your favorite IDE and terminal, MLOps platforms may stand in the way more than help. Add, on top of it, the need to rewrite your code (e.g. to use the platform’s API), learn, set up, and administrate such a platform.

Imagine if you could run your ML workflows the very same way as you do it locally, but they would actually run in the cloud. And you wouldn’t need to worry about provisioning infrastructure, setting up the environment, etc.

We are excited to introduce dstack, an open-source tool that allows exactly that. dstack is a lightweight utility that allows you to define your ML workflows declaratively, and run them in a configured cloud via the CLI.

Define workflows

Say, you have a Python script that works locally (e.g. mnist/train.py). To make it independent of where it runs, you may want to configure how to run it, dependencies, compute resources, etc.

workflows: 
  - name: train
    provider: bash
    deps:
      - tag: mnist_data
    commands:
      - pip install requirements.txt
      - python src/train.py
    artifacts: 
      - path: checkpoint
    resources:
      interruptible: true
      gpu: 1

Run workflows

Now, dstack can use this information to allow you to run it in a configured cloud account while provisioning any compute resources that you specified.

The user will have to use just one command to run the workflow.

dstack run train

When this command is used, dstack uses the very same version of code as you have locally.

And you see the output in real-time as your workflow is running.

Provisioning... It may take up to a minute. ✓

To interrupt, press Ctrl+C.

Epoch 4: 100%|██████████████| 1876/1876 [00:17<00:00, 107.85it/s, loss=0.0944, v_num=0, val_loss=0.108, val_acc=0.968]
`Trainer.fit` stopped: `max_epochs=5` reached.
Testing DataLoader 0: 100%|██████████████| 313/313 [00:00<00:00, 589.34it/s]
Test metric   DataLoader 0
val_acc       0.965399980545044
val_loss      0.10975822806358337

You don’t need to worry about setting up the environment, downloading dependencies, saving artifacts, etc. dstack does it all automatically for you.

The dstack CLI has two main commands: dstack run (runs a given workflow) and dstack ps (lists currently running or recently finished workflows).

dstack ps -aRUN              TARGET   STATUS  ARTIFACTS  SUBMITTED   TAG
angry-elephant-1 download Done    data       8 hours ago mnist_data
wet-insect-1     train    Running checkpoint 1 weeks ago

Other commands allow to stop and restart workflows, browse their output artifacts, add tags, etc.

Host apps

ML workflows are not always only about processing data and training. Sometimes it includes dashboard monitoring, application deployment, debugging, etc. That’s why dstack allows ML workflows to expose ports.

workflows:
  - name: hello-fastapi
    provider: bash
    ports: 1
    commands:
      - pip install fastapi uvicorn
      - uvicorn hello_fastapi:app --host 0.0.0.0 --port $PORT_0

Dev environments

dstack comes with built-in providers that allow launching VS Code, JupyterLab, and Notebooks as workflows.

dstack run code

Once you run such a command, the dev environment is ready within a minute and has the local copy of your code, pre-installed Conda, CUDA driver, and everything else you’ve configured for the workflow, including the required compute resources.

Provisioning... It may take up to a minute. ✓

To interrupt, press Ctrl+C.

Installing extensions...
Installing extension 'ms-python.python'...
Extension 'ms-python.python' v2022.14.0 was successfully installed.
Server bound to 0.0.0.0:3000 (IPv4)
Extension host agent listening on 3000
Web UI available at http://ec2-34-244-253-2.eu-west-1.compute.amazonaws.com:3000/?folder=%2Fworkflow&tkn=f1d59e3a410249eda0b5f65ef20bc26b

Why use dstack

Let’s summarize what dstack is, and why use it.

dstack provides a command-line utility to run any ML code in the cloud. You don’t need to worry about provisioning compute resources, setting up the environment, or managing data.

Anything that runs locally, can run via dstack. You just need to make sure that your cloud account credentials are configured locally, and create a YAML file that describes the dependencies of your workflows.

dstack is a super lightweight alternative to MLOps platforms.

dstack is easy to use with Python scripts, Git, and IDEs.

How to get started

The entire process of installing dstack takes at max 5 minutes.

The tool can be installed with pip:

pip install dstack

Then you have to run the dstack config command:

dstack config

It will prompt you to enter the AWS region where you want dstack to provision infrastructure and the S3 bucket where you want to keep your data.

Region name (eu-west-1):S3 
bucket name (dstack-142421590066-eu-west-1):

That’s it. Now you can define workflows and run them in a configured cloud via the CLI.

dstack is free and fully open-source under Mozilla Public Licence 2.0. The source code is available on GitHub.

You’re very welcome to give dstack a try right away.

Finally, help spread the word and let more people know about dstack by giving our repo a star.

Thanks to Wah Loon Keng for his helpful input on the article and Storyset for providing an illustration.

LLM Fluff

Discussion about this post