Docker Model Runner - Pull LLMs from Hugging Face

This is the third tutorial in the Docker Model Runner sequence

Aug 08, 2025

In previous issues, we demonstrated how to pull LLMs from Docker Hub and run them locally using Docker Modder Runner (DMR). This issue focuses on running LLMs from Hugging Face with DMR.

By the end of this tutorial, you will be able to: identify models that support DMR on Hugging Face, determine the model name and tag, and pull and run the model locally with DMR.

Tutorial level: Beginner

So far in the sequence:

✅ Getting Started with Docker Model Runner

✅ Running LLMs Locally with Docker Model Runner and Python

📌 Docker Model Runner - Pull LLMs from Hugging Face

Let’s get started!

As we mentioned in previous issues, Docker Model Runner (DMR) makes it easy to pull, run, and serve LLMs and other AI models directly from Docker Hub. It also works with any Open Container Initiative (OCI)–compatible registry and supports the GGUF file format for packaging models as OCI Artifacts. In practice, this means you can pull models not only from Docker Hub but also from the largest AI model registry — Hugging Face.

GGUF file format; Image credit: Hugging Face

LLMs are stored and handled much like container images:

They live in a container registry
You pull them with a dedicated command
They follow a similar naming convention: REGISTRY_NAME/LLM_NAME:TAG

Pulling LLMs from a registry; Image credit: Rami Krispin

For example, the Llama 3.2 model on Docker Hub would be labeled as:

ai/llama3.2:latest

Here, ai is the registry namespace (Docker Hub in this case), llama3.2 is the model name, and latest is the tag.

By following this familiar container workflow, DMR lets you treat models just like images, making it easier to manage, share, and deploy them locally.

In the following section, we will review how to find models with support for DMR and pull them to run locally.

Pulling Models from Hugging Face

Hugging Face is often called the “GitHub of AI models.” It hosts over 1.7 million models for a wide range of use cases, including text generation, image-to-text, text-to-image, text-to-speech, and more. The site includes powerful search and filtering tools that let you find models by functionality, number of parameters, supported platforms, and other criteria, making it easy to locate models that work with DMR.

Models with DMR Support

To find models with DMR support:

Go to the Hugging Face website.
Select the Models tab (marked in purple).
In the App section on the left-hand menu (marked in yellow), click Expand (marked in light blue).

From the list of supported apps, choose Docker Model Runner (marked in purple):

The models displayed on the right now all support DMR.

Other useful filters include:

Tasks — choose the type of model (e.g., Text Generation)
Parameters — set a range for the number of model parameters (marked in green)

For this example, we’ll select the Text Generation task (marked in purple) and limit the parameters to 1–6 billion.

Finally, for our demo, we’ll pick Microsoft’s Phi-3-mini-4k-instruc model (marked in yellow) and open its model page:

The Use this model drop-down (marked in purple) button provides the list of supported platforms, and we will select the Docker Model Runner option (marked in yellow).

This will prompt the DMR command to run the model locally:

Note that it uses the run command and not the pull command. This operates like the docker run command - it will look for the model locally, and if not available, will try to pull it from the registry.

We will use the pull command first and then run it:

docker model pull hf.co/microsoft/Phi-3-mini-4k-instruct-gguf

If it works, you should see the following output:

Downloaded: 0.00 MB
Model pulled successfully

We will use the list command to show all the locally available models:

docker model list

This will return the following tables, and you can see that the Microsoft model is among the other models available on my machine:

    MODEL NAME                                   PARAMETERS  QUANTIZATION    ARCHITECTURE  MODEL ID      CREATED        SIZE
ai/llama3.2:latest                           3.21 B      IQ2_XXS/Q4_K_M  llama         436bb282b419  4 months ago   1.87 GiB
ai/gemma3n:latest                            6.87 B      IQ2_XXS/Q4_K_M  gemma3n       800c2ac86449  5 weeks ago    3.94 GiB
hf.co/bartowski/llama-3.2-1b-instruct-gguf   1.24B                       llama         7ca6390d8288  10 months ago  808M
hf.co/microsoft/bitnet-b1.58-2b-4t-gguf      2.41B                       bitnet-b1.58  a4b724af32b7  3 months ago   1.19B
hf.co/microsoft/phi-3-mini-4k-instruct-gguf  3.82B                       phi3          3e95604429e2  15 months ago  7.64B

Last but not least, we will launch the model using the docker model run command:

docker model run hf.co/microsoft/Phi-3-mini-4k-instruct-gguf

We can now interact with the model from the CLI, for example, let’s ask the following question:

What is the capital of the United States of America?

And here is the output:

Interactive chat mode started. Type '/bye' to exit.
> What is the captical of the United States of America?
 The capital of the United States of America is Washington, D.C.

Running Hugging Face Model with Python

Running with LLMs from Hugging Face with DMR and Python follows the exact same workflow as shown in the previous issue. In this case, this is how you can run the above workflow with Python:

import openai 

# Setting the model URL for running within a container
base_url= "http://model-runner.docker.internal/engines/v1"

# Setting the client
client = openai.OpenAI(
  base_url = base_url,
  api_key = "docker"
)

# Calling the model
completion = client.chat.completions.create(
    model="hf.co/microsoft/phi-3-mini-4k-instruct-gguf", 
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "What is the capital of the United States of America?"}
        ],
)

# Parsing the output
print(completion.choices[0].message.content)

As expected, this will print the following text:

The capital of the United States of America is Washington, D.C.

Summary

Docker Model Runner follows the Open Container Initiative (OCI) standards for model storage using the GGUF format. This gives users access to the largest AI model hub — Hugging Face. In this issue, we walked through how, in just a few simple steps, you can filter and identify an AI model on Hugging Face, then pull and run it locally with DMR.

Resources

Getting Started with Docker Model Runner - link
Running LLMs Locally with Docker Model Runner and Python - link
Hugging Face models - https://huggingface.co/models
Docker Desktop documentation - https://docs.docker.com/desktop/
Docker Model Runner Documentation - https://docs.docker.com/ai/model-runner/
Available LLMs on Docker Hub - https://hub.docker.com/u/ai

The AIOps Newsletter

Discussion about this post