Getting Started with Docker Model Runner

This is the first out-of-sequence tutorial that focuses on Docker Model Runner

Jul 22, 2025

This edition focuses on understanding what Docker Model Runner is and how to set it up. By the end of this tutorial, you will be able to:

Enable Docker Model Runner
Pull LLMs from Docker Hub
Run LLMs locally using both the CLI and Docker Desktop UI

Tutorial level: Beginner

In the coming editions, we will focus on working with Docker Model Runner with Python using the OpenAI Python API SDK and review how to pull models from Hugging Face.

Let’s get started!

What Docker Model Runner Is

The Docker Model Runner (DMR) - a new Docker Desktop feature that enables the run of Large Language Models natively with Docker Desktop. This feature follows the common Docker workflow:

Pull models from registries (e.g., Docker Hub)
Run models locally with GPU acceleration
Integrate models into the development workflows

Of course, the LLMs' performance is derived from the model size and resources available locally.

This feature is currently under Beta, and required Docker Engine (Linux) or Docker Desktop 4.40 and above for MacOS, and Docker Desktop 4.41 for Windows Docker Engine. For hardware requirements, please check the Docker Model Runner documentation

The goal of this feature is to enable developers to test and run AI models locally seamlessly using familiar Docker CLI commands and tools.

Key Features:

Run LLMs locally, enabling users to run the models with full privacy control
Supports a curated catalog of open-source LLM on Ducker Hub and Hugging Face models
Native GPU acceleration, fully utilize and optimize local resources
Fully integrate with the Docker workflow and tools
Following the previous point, you can package and share the models using the Docker registry framework, such as Docker Hub
Manage local models and display logs

The Docker Model Runner runs AI models using a host-installed inference server. It uses the llama.cpp framework, which enables running LLM inference using C/C++ and "talking" directly with the hardware GPU acceleration on Apple Silicon on Mac OS and NVIDIA GPUs on Windows OS. Therefore, it enables better optimization of local resources and faster deployment, as opposed to running LLM inside a container.

Models

Working with LLMs with the Model Runner is fairly similar to working with containers. The core supported LLMs are available on Docker Hub, including models such as Meta’s Llama 3, Microsoft Phi 4, Google Gemma 3, and Mistral Instruct.

In addition to the core supported models on Docker Hub, it supports direct download of models from Hugging Face via the CLI. We will review this functionality in a future issue.

Getting Started with Docker Model Runner

To run LLMs locally with the DMR, you will need the following:

Docker Engine (Linux), or Docker Desktop version 4.40.0 (MacOS) and version 4.41.0 (Windows) and above
Docker Hub account (required to access LLMs)
Optional - Hugging Face account

There are two methods to interact with DMR:

Docker Desktop
CLI

Both methods are straightforward to use, and they include the following steps:

Enabling the feature (required as long as it is under Beta)
Pulling a model
Launching the model

Let’s review both methods.

Running LLMs with DMR Using Docker Desktop

Let’s start by enabling the feature. On the Docker Desktop settings menu, go to the Beta features option (marked in purple) and select the Enable Docker Model Runner option (marked in yellow):

The host-side TCP support option (marked in white) enables you to expose LLMs using TCP (Transmission Control Protocol) to a port. This is something we will review in the next tutorial.

Next, let’s use Docker Desktop Dashboard to pull LLM from Docker Hub using the Models menu (marked in purple). If this is the first time you use this feature or you don’t have any model locally, you should have the following view:

To get a list of available models on Docker Hub, click either the Docker Hub tab or the Add models button (both marked in yellow). This will lead you to the following screen, providing the available LLMs to pull:

If you click the model box, you can get the model description, which includes versions, number of parameters, context window, size, format, etc. For example, let’s review the properties of the Llama 3.2 model and available versions:

We will download the model’s latest version using the Pull llama3.2 button (marked in purple). You can select a different version of the model using the drop-down selector to the right (marked in yellow).

The model should now be available under the Local tab:

Although the model is not stored as a Docker image, it operates like one. This means that, similar to a Docker image, you need to launch (or run) the model first in order to interact with it.

You can delete a model using the trash icon under the Actions column

Let’s click the Play button under the Actions column to launch the model:

The model is now active in a chat mode, and you can send your questions via the chat box.

Last but not least, you can use the Logs tab to review the model logs:

Running LLMs with DMR via the CLI

Let’s now repeat the same steps, this time using the CLI. Let’s start by enabling the DMR feature:

docker desktop enable model-runner

If you want to expose the model using TCP, you can add the tcp argument to set the port.

The interaction with the DMR on the CLI is done with the docker model command. You can see its arguments using the —help argument. To validate that the feature is available and enabled, you use the docker model status command:

docker model status

You should expect the following output:

Docker Model Runner is running

Next, let’s review what models are available locally using the list command:

docker model list

This returns the following output:

MODEL NAME          PARAMETERS  QUANTIZATION    ARCHITECTURE  MODEL ID      CREATED       SIZE
ai/llama3.2:latest  3.21 B      IQ2_XXS/Q4_K_M  llama         436bb282b419  3 months ago  1.87 GiB

You can notice that the model we pulled earlier using the Docker Desktop UI is listed in the output.

Like regular images, we can use the pull command to pull models from Docker Hub. For example, let’s pull the latest version of the gemma3n model:

docker model pull ai/gemma3n:latest

We can confirm that the model is available using the list command again. This will return the following output:

MODEL NAME          PARAMETERS  QUANTIZATION    ARCHITECTURE  MODEL ID      CREATED       SIZE
ai/llama3.2:latest  3.21 B      IQ2_XXS/Q4_K_M  llama         436bb282b419  3 months ago  1.87 GiB
ai/gemma3n:latest   6.87 B      IQ2_XXS/Q4_K_M  gemma3n       800c2ac86449  3 weeks ago   3.94 GiB

The inspect command provides the model metadata:

docker model inspect ai/gemma3n

This returns the metadata in a JSON format:

{
    "id": "sha256:800c2ac86449991e1f69edb098e27ff51a73a7726fa39ddd9e082f7ebe64cd07",
    "tags": [
        "ai/gemma3n:latest"
    ],
    "created": 1751012329,
    "config": {
        "format": "gguf",
        "quantization": "IQ2_XXS/Q4_K_M",
        "parameters": "6.87 B",
        "architecture": "gemma3n",
        "size": "3.94 GiB",
.
.
.
"tokenizer.ggml.add_bos_token": "true",
            "tokenizer.ggml.add_eos_token": "false",
            "tokenizer.ggml.add_sep_token": "false",
            "tokenizer.ggml.add_space_prefix": "false",
            "tokenizer.ggml.bos_token_id": "2",
            "tokenizer.ggml.eos_token_id": "1",
            "tokenizer.ggml.model": "llama",
            "tokenizer.ggml.padding_token_id": "0",
            "tokenizer.ggml.pre": "default",
            "tokenizer.ggml.unknown_token_id": "3"
        }
    }
}

Let’s now launch the gemma3n model using the run command. This will launch the LLM in a chat mode:

Last but not least, you can access the DMR logs using the logs command:

docker model logs

Summary

The Docker Model Runner is a new feature of Docker Desktop that enables running LLMs locally using a host-installed inference server. It efficiently utilizes local resources and hardware and seamlessly integrates with the containerized workflow. This enables developers to integrate LLM seamlessly into the development life cycle with full privacy control.

In this tutorial, we introduced how to set it up, pull, and launch a model using both Docker Desktop and CLI. The DMR is fully compatible with the OpenAI API SDKs, enabling seamless use of the DMR with minimal changes in your code. In the next tutorial, we will focus on leveraging the OpenAI API Python SDK to interact with LLMs with Python.

Resources

Docker Desktop documentation - https://docs.docker.com/desktop/
Docker Model Runner Documentation - https://docs.docker.com/ai/model-runner/
Available LLMs on Docker Hub - https://hub.docker.com/u/ai

Share The AIOps Newsletter

The AIOps Newsletter

Getting Started with Docker Model Runner

This is the first out-of-sequence tutorial that focuses on Docker Model Runner

What Docker Model Runner Is

Models

Getting Started with Docker Model Runner

Running LLMs with DMR Using Docker Desktop

Running LLMs with DMR via the CLI

Summary

Resources

Discussion about this post