Marc Sherwood – Docker

A Quick Guide to Containerizing Llamafile with Docker for AI Applications

Sophia Parafina — Wed, 15 May 2024 13:39:56 +0000

This post was contributed by Sophia Parafina.

Keeping pace with the rapid advancements in artificial intelligence can be overwhelming. Every week, new Large Language Models (LLMs), vector databases, and innovative techniques emerge, potentially transforming the landscape of AI/ML development. Our extensive collaboration with developers has uncovered numerous creative and effective strategies to harness Docker in AI development.

This quick guide shows how to use Docker to containerize llamafile, an executable that brings together all the components needed to run a LLM chatbot with a single file. This guide will walk you through the process of containerizing llamafile and having a functioning chatbot running for experimentation.

Llamafile’s concept of bringing together LLMs and local execution has sparked a high level of interest in the GenAI space, as it aims to simplify the process of getting a functioning LLM chatbot running locally.

Containerize llamafile

Llamafile is a Mozilla project that runs open source LLMs, such as Llama-2-7B, Mistral 7B, or any other models in the GGUF format. The Dockerfile builds and containerizes llamafile, then runs it in server mode. It uses Debian trixie as the base image to build llamafile. The final or output image uses debian:stable as the base image.

To get started, copy, paste, and save the following in a file named Dockerfile.

# Use debian trixie for gcc13
FROM debian:trixie as builder

# Set work directory
WORKDIR /download

# Configure build container and build llamafile
RUN mkdir out && \
    apt-get update && \
    apt-get install -y curl git gcc make && \
    git clone https://github.com/Mozilla-Ocho/llamafile.git  && \
    curl -L -o ./unzip https://cosmo.zip/pub/cosmos/bin/unzip && \
    chmod 755 unzip && mv unzip /usr/local/bin && \
    cd llamafile && make -j8 LLAMA_DISABLE_LOGS=1 && \ 
    make install PREFIX=/download/out

# Create container
FROM debian:stable as out

# Create a non-root user
RUN addgroup --gid 1000 user && \
    adduser --uid 1000 --gid 1000 --disabled-password --gecos "" user

# Switch to user
USER user

# Set working directory
WORKDIR /usr/local

# Copy llamafile and man pages
COPY --from=builder /download/out/bin ./bin
COPY --from=builder /download/out/share ./share/man

# Expose 8080 port.
EXPOSE 8080

# Set entrypoint.
ENTRYPOINT ["/bin/sh", "/usr/local/bin/llamafile"]

# Set default command.
CMD ["--server", "--host", "0.0.0.0", "-m", "/model"]

To build the container, run:

docker build -t llamafile .

Running the llamafile container

To run the container, download a model such as Mistral-7b-v0.1. The example below saves the model to the model directory, which is mounted as a volume.

$ docker run -d -v ./model/mistral-7b-v0.1.Q5_K_M.gguf:/model -p 8080:8080 llamafile

The container will open a browser window with the llama.cpp interface (Figure 1).

Figure 1: Llama.cpp is a C/C++ port of Facebook’s LLaMA model by Georgi Gerganov, optimized for efficient LLM inference across various devices, including Apple silicon, with a straightforward setup and advanced performance tuning features.

$ curl -s http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "system",
      "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."
    },
    {
      "role": "user",
      "content": "Compose a poem that explains the concept of recursion in programming."
    }
  ]
}' | python3 -c '
import json
import sys
json.dump(json.load(sys.stdin), sys.stdout, indent=2)
print()
'

Llamafile has many parameters to tune the model. You can see the parameters with man llama file or llama file --help. Parameters can be set in the Dockerfile CMD directive.

Now that you have a containerized llamafile, you can run the container with the LLM of your choice and begin your testing and development journey.

What’s next?

To continue your AI development journey, read the Docker GenAI guide, review the additional AI content on the blog, and check out our resources.

Learn more

Read the Docker AI/ML blog post collection.
Download the Docker GenAI guide.
Read the Llamafile announcement post on Mozilla.org.
Subscribe to the Docker Newsletter.
Have questions? The Docker community is here to help.
New to Docker? Get started.

Why Are There More Than 100 Million Pull Requests for AI/ML Images on Docker Hub?

Marc Sherwood — Thu, 20 Jul 2023 15:59:40 +0000

A quick look at pull requests of well-known AI/ML-related images on Docker Hub shows more than 100 million pull requests. What is driving this level of demand in the AI/ML space? The same things that drive developers to use Docker for any project: accelerating development, streamlining collaboration, and ensuring consistency within projects.

In this article, we’ll look more closely at how Docker provides a powerful tool for AI/ML development.

As we interact with more development teams who use Docker as part of their AI/ML efforts, we are learning about new and exciting use cases and hearing first-hand how using Docker has helped simplify the process of sharing AI/ML solutions with their teams and other AI/ML practitioners.

Why is Docker the deployment choice for millions of developers when working with AI/ML?

AI/ML development involves managing complex dependencies, libraries, and configurations, which can be challenging and time-consuming. Although these complexities are not limited to AI/ML development, with AI/ML, they can be more taxing on developers. Docker, however, has been helping developers address such issues for 10 years now.

Consistency across environments

Docker allows you to create a containerized environment that includes all the dependencies required for your AI/ML project, including libraries, tools, and frameworks. This environment can be easily shared and replicated across different machines and operating systems, ensuring consistency and reproducibility. Docker images can also be version-controlled and shared via container registries such as Docker Hub, thus enabling seamless collaboration and continuous integration and delivery.

Scalability

Docker provides a lightweight and efficient way to scale AI/ML applications. With Docker, you can run multiple containers on the same machine or across different machines in a cluster, enabling horizontal scaling. This approach can help you handle large datasets, run multiple experiments in parallel, and increase the overall performance of your applications.

Portability

Docker provides portability, allowing you to run your AI/ML applications on any platform that supports Docker, including local machines, cloud-based infrastructures, and edge devices. Docker images can be built once and deployed anywhere, eliminating compatibility issues and reducing the need for complex configurations. This can help you streamline the deployment process and focus on the development of your models.

Reproducibility

Docker enables reproducibility by providing a way to package the entire AI/ML application and its dependencies into a container. This container can be easily shared and replicated, ensuring that experiments are reproducible, regardless of the environment they are run in. Docker provides a way to specify the exact versions of dependencies and configurations needed to reproduce results, which can help validate experiments and ensure reliability and repeatability.

Easy collaboration

Docker makes it easy to collaborate on AI/ML projects with team members or colleagues. Docker images or containers can be easily shared and distributed, ensuring that everyone has access to the same environment and dependencies. This collaboration can help streamline the development process and reduce the time and effort required to set up development environments.

Conclusion

Docker provides a powerful tool for AI/ML development, providing consistency, scalability, portability, reproducibility, and collaboration. By using Docker to package and distribute AI/ML applications and their dependencies, developers can simplify the development process and focus on building and improving their models.

Check out the Accelerated AI/ML Development page to learn more about how Docker fits into the AI/ML development process.

If you have an interesting use case or story about Docker in your AI/ML workflow, we would love to hear from you and maybe even share your story.

Learn more

Get the latest release of Docker Desktop.
Vote on what’s next! Check out our public roadmap.
Have questions? The Docker community is here to help.
New to Docker? Get started.

Full-Stack Reproducibility for AI/ML with Docker and Kaskada

Marc Sherwood — Tue, 20 Jun 2023 12:55:51 +0000

Docker is used by millions of developers to optimize the setup and deployment of development environments and application stacks. As artificial intelligence (AI) and machine learning (ML) are becoming key components of many applications, the core benefits of Docker are now doing more heavy lifting and accelerating the development cycle.

Gartner predicts that “by 2027, over 90% of new software applications that are developed in the business will contain ML models or services as enterprises utilize the massive amounts of data available to the business.”

This article, written by our partner DataStax, outlines how Kaskada, open source, and Docker are helping developers optimize their AI/ML efforts.

Introduction

As a data scientist or machine learning practitioner, your work is all about experimentation. You start with a hunch about the story your data will tell, but often you’ll only find an answer after false starts and failed experiments. The faster you can iterate and try things, the faster you’ll get to answers. In many cases, the insights gained from solving one problem are applicable to other related problems. Experimentation can lead to results much faster when you’re able to build on the prior work of your colleagues.

But there are roadblocks to this kind of collaboration. Without the right tools, data scientists waste time managing code dependencies, resolving version conflicts, and repeatedly going through complex installation processes. Building on the work of colleagues can be hard due to incompatible environments — the dreaded “it works for me” syndrome.

Enter Docker and Kaskada, which offer a similar solution to these different problems: a declarative language designed specifically for the problem at hand and an ecosystem of supporting tools (Figure 1).

Figure 1: Dockerfile defines the build steps.

For Docker, the Dockerfile format describes the exact steps needed to build a reproducible development environment and an ecosystem of tools for working with containers (Docker Hub, Docker Desktop, Kubernetes, etc.). With Docker, data scientists can package their code and dependencies into an image that can run as a container on any machine, eliminating the need for complex installation processes and ensuring that colleagues can work with the exact same development environment.

With Kaskada, data scientists can compute and share features as code and use those throughout the ML lifecycle — from training models locally to maintaining real-time features in production. The computations required to produce these datasets are often complex and expensive because standard tools like Spark have difficulty reconstructing the temporal context required for training real-time ML models.

Kaskada solves this problem by providing a way to compute features — especially those that require reasoning about time — and sharing feature definitions as code. This approach allows data scientists to collaborate with each other and with machine learning engineers on feature engineering and reuse code across projects. Increased reproducibility dramatically speeds cycle times to get models into production, increases model accuracy, and ultimately improves machine learning results.

Example walkthrough

Let’s see how Docker and Kaskada improve the machine learning lifecycle by walking through a simplified example. Imagine you’re trying to build a real-time model for a mobile game and want to predict an outcome, for example, whether a user will pay for an upgrade.

Setting up your experimentation environment

To begin, start a Docker container that comes preinstalled with Jupyter and Kaskada:

docker run --rm -p 8888:8888 kaskadaio/jupyter
open

This step instantly gives you a reproducible development environment to work in, but you might want to customize this environment. Additional development tools can be added by creating a new Dockerfile using this image as the “base image”:

# Dockerfile
FROM kaskadaio/jupyter

COPY requirements.txt
RUN pip install -r requirements.txt

In this example, you started with Jupyter and Kaskada, copied over a requirements file and installed all the dependencies in it. You now have a new Docker image that you can use as a data science workbench and share across your organization: Anyone in your organization with this Dockerfile can reproduce the same environment you’re using by building and running your Dockerfile.

docker build -t experimentation_env .
docker run --rm -p 8888:8888 experimentation_env

The power of Docker comes from the fact that you’ve created a file that describes your environment and now you can share this file with others.

Training your model

Inside a new Jupyter notebook, you can begin the process of exploring solutions to the problem — predicting purchase behavior. To begin, you’ll create tables to organize the different types of events produced by the imaginary game.

% pip install kaskada
% load_ext fenlmagic

session = LocalBuilder().build()

table.create_table(
table_name = "GamePlay",
time_column_name = "timestamp",
entity_key_column_name = "user_id",
)
table.create_table(
table_name = "Purchase",
time_column_name = "timestamp",
entity_key_column_name = "user_id",
)

table.load(
table_name = "GamePlay", 
file = "historical_game_play_events.parquet",
)
table.load(
table_name = "Purchase", 
file = "historical_purchase_events.parquet",
)

Kaskada is easy to install and use locally. After installing, you’re ready to start creating tables and loading event data into them. Kaskada’s vectorized engine is built for high-performance local execution, and, in many cases, you can start experimenting on your data locally, without the complexity of managing distributed compute clusters.

Kaskada’s query language was designed to make it easy for data scientists to build features and training examples directly from raw event data. A single query can replace complex ETL and pre-aggregation pipelines, and Kaskda’s unique temporal operations unlock native time travel for building training examples “as of” important times in the past.

%% fenl --var training

# Create views derived from the source tables
let GameVictory = GamePlay | when(GamePlay.won)
let GameDefeat = GamePlay | when(not GamePlay.won)

# Compute some features as inputs to our model
let features = {
  loss_duration: sum(GameVictory.duration),
  purchase_count: count(Purchase),
}

# Observe our features at the time of a player's second victory
let example = features
  | when(count(GameDefeat, window=since(GameVictory)) == 2)
  | shift_by(hours(1))

# Compute a target value
# In this case comparing purchase count at prediction and label time
let target = count(Purchase) > example.purchase_count

# Combine feature and target values computed at the different times
in extend(example, {target})

In the following example, you first apply filtering to the events, build simple features, observe them at the points in time when your model will be used to make predictions, then combine the features with the value you want to predict, computed an hour later. Kaskada lets you describe all these operations “from scratch,” starting with raw events and ending with an ML training dataset.

from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

x = training.dataframe[['loss_duration']]
y = training.dataframe['target']

scaler = preprocessing.StandardScaler().fit(X)
X_scaled = scaler.transform(X)

model = LogisticRegression(max_iter=1000)
model.fit(X_scaled, y)

Kaskada’s query language makes it easy to write an end-to-end transformation from raw events to a training dataset.

Conclusion

Docker and Kaskada enable data scientists and ML engineers to solve real-time ML problems quickly and reproducibly. With Docker, you can manage your development environment with ease, ensuring that your code runs the same way on every machine. With Kaskada, you can collaborate with colleagues on feature engineering and reuse queries as code across projects. Whether you’re working independently or as part of a team, these tools can help you get answers faster and more efficiently than ever before.

Get started with Kaskada’s official images on Docker Hub.

Docker Init: Initialize Dockerfiles and Compose files with a single CLI command

Marc Sherwood — Thu, 11 May 2023 14:07:13 +0000

Docker has revolutionized the way developers build, package, and deploy their applications. Docker containers provide a lightweight, portable, and consistent runtime environment that can run on any infrastructure. And now, the Docker team has developed docker init, a new command-line interface (CLI) command introduced as a beta feature that simplifies the process of adding Docker to a project (Figure 1).

Note: Docker Init should not be confused with the internally -used docker-init executable, which is invoked by Docker when utilizing the –init flag with the docker run command.

Figure 1: With one command, all required Docker files are created and added to your project.

Create assets automatically

The new docker init command automates the creation of necessary Docker assets, such as Dockerfiles, Compose files, and .dockerignore files, based on the characteristics of the project. By executing the docker init command, developers can quickly containerize their projects. Docker init is a valuable tool for developers who want to experiment with Docker, learn about containerization, or integrate Docker into their existing projects.

To use docker init, developers need to upgrade to the version 4.19.0 or later of Docker Desktop and execute the command in the target project folder. Docker init will detect the project definitions, and it will automatically generate the necessary files to run the project in Docker.

The current Beta release of docker init supports Go, Node, and Python, and our development team is actively working to extend support for additional languages and frameworks, including Java, Rust, and .NET. If there is a language or stack that you would like to see added or if you have other feedback about docker init, let us know through our Google form.

In conclusion, docker init is a valuable tool for developers who want to simplify the process of adding Docker support to their projects. It automates the creation of necessary Docker assets and can help standardize the creation of Docker assets across different projects. By enabling developers to focus on developing their applications and reducing the risk of errors and inconsistencies, Docker init can help accelerate the adoption of Docker and containerization.

See Docker Init in action

To see docker init in action, check out the following overview video by Francesco Ciulla, which demonstrates building the required Docker assets to your project.

Check out the documentation to learn more.