Engineering – Docker https://www.docker.com Thu, 09 May 2024 18:43:17 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.3 https://www.docker.com/wp-content/uploads/2024/02/cropped-docker-logo-favicon-32x32.png Engineering – Docker https://www.docker.com 32 32 A Promising Methodology for Testing GenAI Applications in Java https://www.docker.com/blog/testing-genai-applications-in-java/ Wed, 24 Apr 2024 16:03:14 +0000 https://www.docker.com/?p=54150 In the vast universe of programming, the era of generative artificial intelligence (GenAI) has marked a turning point, opening up a plethora of possibilities for developers.

Tools such as LangChain4j and Spring AI have democratized access to the creation of GenAI applications in Java, allowing Java developers to dive into this fascinating world. With Langchain4j, for instance, setting up and interacting with large language models (LLMs) has become exceptionally straightforward. Consider the following Java code snippet:

public static void main(String[] args) {
    var llm = OpenAiChatModel.builder()
            .apiKey("demo")
            .modelName("gpt-3.5-turbo")
            .build();
    System.out.println(llm.generate("Hello, how are you?"));
}

This example illustrates how a developer can quickly instantiate an LLM within a Java application. By simply configuring the model with an API key and specifying the model name, developers can begin generating text responses immediately. This accessibility is pivotal for fostering innovation and exploration within the Java community. More than that, we have a wide range of models that can be run locally, and various vector databases for storing embeddings and performing semantic searches, among other technological marvels.

Despite this progress, however, we are faced with a persistent challenge: the difficulty of testing applications that incorporate artificial intelligence. This aspect seems to be a field where there is still much to explore and develop.

In this article, I will share a methodology that I find promising for testing GenAI applications.

2400x1260 2024 GenAi

Project overview

The example project focuses on an application that provides an API for interacting with two AI agents capable of answering questions. 

An AI agent is a software entity designed to perform tasks autonomously, using artificial intelligence to simulate human-like interactions and responses. 

In this project, one agent uses direct knowledge already contained within the LLM, while the other leverages internal documentation to enrich the LLM through retrieval-augmented generation (RAG). This approach allows the agents to provide precise and contextually relevant answers based on the input they receive.

I prefer to omit the technical details about RAG, as ample information is available elsewhere. I’ll simply note that this example employs a particular variant of RAG, which simplifies the traditional process of generating and storing embeddings for information retrieval.

Instead of dividing documents into chunks and making embeddings of those chunks, in this project, we will use an LLM to generate a summary of the documents. The embedding is generated based on that summary.

When the user writes a question, an embedding of the question will be generated and a semantic search will be performed against the embeddings of the summaries. If a match is found, the user’s message will be augmented with the original document.

This way, there’s no need to deal with the configuration of document chunks, worry about setting the number of chunks to retrieve, or worry about whether the way of augmenting the user’s message makes sense. If there is a document that talks about what the user is asking, it will be included in the message sent to the LLM.

Technical stack

The project is developed in Java and utilizes a Spring Boot application with Testcontainers and LangChain4j.

For setting up the project, I followed the steps outlined in Local Development Environment with Testcontainers and Spring Boot Application Testing and Development with Testcontainers.

I also use Tescontainers Desktop to facilitate database access and to verify the generated embeddings as well as to review the container logs.

The challenge of testing

The real challenge arises when trying to test the responses generated by language models. Traditionally, we could settle for verifying that the response includes certain keywords, which is insufficient and prone to errors.

static String question = "How I can install Testcontainers Desktop?";
@Test
    void verifyRaggedAgentSucceedToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        assertThat(answer).contains("https://testcontainers.com/desktop/");
    }

This approach is not only fragile but also lacks the ability to assess the relevance or coherence of the response.

An alternative is to employ cosine similarity to compare the embeddings of a “reference” response and the actual response, providing a more semantic form of evaluation. 

This method measures the similarity between two vectors/embeddings by calculating the cosine of the angle between them. If both vectors point in the same direction, it means the “reference” response is semantically the same as the actual response.

static String question = "How I can install Testcontainers Desktop?";
static String reference = """
       - Answer must indicate to download Testcontainers Desktop from https://testcontainers.com/desktop/
       - Answer must indicate to use brew to install Testcontainers Desktop in MacOS
       - Answer must be less than 5 sentences
       """;
@Test
    void verifyRaggedAgentSucceedToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        double cosineSimilarity = getCosineSimilarity(reference, answer);
        assertThat(cosineSimilarity).isGreaterThan(0.8);
    }

However, this method introduces the problem of selecting an appropriate threshold to determine the acceptability of the response, in addition to the opacity of the evaluation process.

Toward a more effective method

The real problem here arises from the fact that answers provided by the LLM are in natural language and non-deterministic. Because of this, using current testing methods to verify them is difficult, as these methods are better suited to testing predictable values. 

However, we already have a great tool for understanding non-deterministic answers in natural language: LLMs themselves. Thus, the key may lie in using one LLM to evaluate the adequacy of responses generated by another LLM. 

This proposal involves defining detailed validation criteria and using an LLM as a “Validator Agent” to determine if the responses meet the specified requirements. This approach can be applied to validate answers to specific questions, drawing on both general knowledge and specialized information

By incorporating detailed instructions and examples, the Validator Agent can provide accurate and justified evaluations, offering clarity on why a response is considered correct or incorrect.

static String question = "How I can install Testcontainers Desktop?";
    static String reference = """
            - Answer must indicate to download Testcontainers Desktop from https://testcontainers.com/desktop/
            - Answer must indicate to use brew to install Testcontainers Desktop in MacOS
            - Answer must be less than 5 sentences
            """;

    @Test
    void verifyStraightAgentFailsToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/straight?question={question}", ChatController.ChatResponse.class, question).message();
        ValidatorAgent.ValidatorResponse validate = validatorAgent.validate(question, answer, reference);
        assertThat(validate.response()).isEqualTo("no");
    }

    @Test
    void verifyRaggedAgentSucceedToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        ValidatorAgent.ValidatorResponse validate = validatorAgent.validate(question, answer, reference);
        assertThat(validate.response()).isEqualTo("yes");
    }

We can even test more complex responses where the LLM should suggest a better alternative to the user’s question.

static String question = "How I can find the random port of a Testcontainer to connect to it?";
    static String reference = """
            - Answer must not mention using getMappedPort() method to find the random port of a Testcontainer
            - Answer must mention that you don't need to find the random port of a Testcontainer to connect to it
            - Answer must indicate that you can use the Testcontainers Desktop app to configure fixed port
            - Answer must be less than 5 sentences
            """;

    @Test
    void verifyRaggedAgentSucceedToAnswerHowToDebugWithTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        ValidatorAgent.ValidatorResponse validate = validatorAgent.validate(question, answer, reference);
        assertThat(validate.response()).isEqualTo("yes");
    }

Validator Agent

The configuration for the Validator Agent doesn’t differ from that of other agents. It is built using the LangChain4j AI Service and a list of specific instructions:

public interface ValidatorAgent {
    @SystemMessage("""
                ### Instructions
                You are a strict validator.
                You will be provided with a question, an answer, and a reference.
                Your task is to validate whether the answer is correct for the given question, based on the reference.
                
                Follow these instructions:
                - Respond only 'yes', 'no' or 'unsure' and always include the reason for your response
                - Respond with 'yes' if the answer is correct
                - Respond with 'no' if the answer is incorrect
                - If you are unsure, simply respond with 'unsure'
                - Respond with 'no' if the answer is not clear or concise
                - Respond with 'no' if the answer is not based on the reference
                
                Your response must be a json object with the following structure:
                {
                    "response": "yes",
                    "reason": "The answer is correct because it is based on the reference provided."
                }
                
                ### Example
                Question: Is Madrid the capital of Spain?
                Answer: No, it's Barcelona.
                Reference: The capital of Spain is Madrid
                ###
                Response: {
                    "response": "no",
                    "reason": "The answer is incorrect because the reference states that the capital of Spain is Madrid."
                }
                """)
    @UserMessage("""
            ###
            Question: {{question}}
            ###
            Answer: {{answer}}
            ###
            Reference: {{reference}}
            ###
            """)
    ValidatorResponse validate(@V("question") String question, @V("answer") String answer, @V("reference") String reference);

    record ValidatorResponse(String response, String reason) {}
}

As you can see, I’m using Few-Shot Prompting to guide the LLM on the expected responses. I also request a JSON format for responses to facilitate parsing them into objects, and I specify that the reason for the answer must be included, to better understand the basis of its verdict.

Conclusion

The evolution of GenAI applications brings with it the challenge of developing testing methods that can effectively evaluate the complexity and subtlety of responses generated by advanced artificial intelligences. 

The proposal to use an LLM as a Validator Agent represents a promising approach, paving the way towards a new era of software development and evaluation in the field of artificial intelligence. Over time, we hope to see more innovations that allow us to overcome the current challenges and maximize the potential of these transformative technologies.

Learn more

]]>
Get Started with the Latest Updates for Dockerfile Syntax (v1.7.0) https://www.docker.com/blog/new-dockerfile-capabilities-v1-7-0/ Tue, 09 Apr 2024 15:16:53 +0000 https://www.docker.com/?p=53427
Dockerfiles are fundamental tools for developers working with Docker, serving as a blueprint for creating Docker images. These text documents contain all the commands a user could call on the command line to assemble an image. Understanding and effectively utilizing Dockerfiles can significantly streamline the development process, allowing for the automation of image creation and ensuring consistent environments across different stages of development. Dockerfiles are pivotal in defining project environments, dependencies, and the configuration of applications within Docker containers.

With new versions of the BuildKit builder toolkit, Docker Buildx CLI, and Dockerfile frontend for BuildKit (v1.7.0), developers now have access to enhanced Dockerfile capabilities. This blog post delves into these new Dockerfile capabilities and explains how you can can leverage them in your projects to further optimize your Docker workflows.

2400x1260 dockerfile images

Versioning

Before we get started, here’s a quick reminder of how Dockerfile is versioned and what you should do to update it. 

Although most projects use Dockerfiles to build images, BuildKit is not limited only to that format. BuildKit supports multiple different frontends for defining the build steps for BuildKit to process. Anyone can create these frontends, package them as regular container images, and load them from a registry when you invoke the build.

With the new release, we have published two such images to Docker Hub: docker/dockerfile:1.7.0 and docker/dockerfile:1.7.0-labs.

To use these frontends, you need to specify a #syntax directive at the beginning of the file to tell BuildKit which frontend image to use for the build. Here we have set it to use the latest of the 1.x.x major version. For example:

#syntax=docker/dockerfile:1

FROM alpine
...

This means that BuildKit is decoupled from the Dockerfile frontend syntax. You can start using new Dockerfile features right away without worrying about which BuildKit version you’re using. All the examples described in this article will work with any version of Docker that supports BuildKit (the default builder as of Docker 23), as long as you define the correct #syntax directive on the top of your Dockerfile.

You can learn more about Dockerfile frontend versions in the documentation. 

Variable expansions

When you write Dockerfiles, build steps can contain variables that are defined using the build arguments (ARG) and environment variables (ENV) instructions. The difference between build arguments and environment variables is that environment variables are kept in the resulting image and persist when a container is created from it.

When you use such variables, you most likely use ${NAME} or, more simply, $NAME in COPY, RUN, and other commands.

You might not know that Dockerfile supports two forms of Bash-like variable expansion:

  • ${variable:-word}: Sets a value to word if the variable is unset
  • ${variable:+word}: Sets a value to word if the variable is set

Up to this point, these special forms were not that useful in Dockerfiles because the default value of ARG instructions can be set directly:

FROM alpine
ARG foo="default value"

If you are an expert in various shell applications, you know that Bash and other tools usually have many additional forms of variable expansion to ease the development of your scripts.

In Dockerfile v1.7, we have added:

  • ${variable#pattern} and ${variable##pattern} to remove the shortest or longest prefix from the variable’s value.
  • ${variable%pattern} and ${variable%%pattern} to remove the shortest or longest suffix from the variable’s value.
  • ${variable/pattern/replacement} to first replace occurrence of a pattern
  • ${variable//pattern/replacement} to replace all occurrences of a pattern

How these rules are used might not be completely obvious at first. So, let’s look at a few examples seen in actual Dockerfiles.

For example, projects often can’t agree on whether versions for downloading your dependencies should have a “v” prefix or not. The following allows you to get the format you need:

# example VERSION=v1.2.3
ARG VERSION=${VERSION#v}
# VERSION is now '1.2.3'

In the next example, multiple variants are used by the same project:

ARG VERSION=v1.7.13
ADD https://github.com/containerd/containerd/releases/download/${VERSION}/containerd-${VERSION#v}-linux-amd64.tar.gz / 

To configure different command behaviors for multi-platform builds, BuildKit provides useful built-in variables like TARGETOS and TARGETARCH. Unfortunately, not all projects use the same values. For example, in containers and the Go ecosystem, we refer to 64-bit ARM architecture as arm64, but sometimes you need aarch64 instead.

ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${TARGETARCH/arm64/aarch64}.zip /

In this case, the URL also uses a custom name for AMD64 architecture. To pass a variable through multiple expansions, use another ARG definition with an expansion from the previous value. You could also write all the definitions on a single line, as ARG allows multiple parameters, which may hurt readability.

ARG ARCH=${TARGETARCH/arm64/aarch64}
ARG ARCH=${ARCH/amd64/x64}
ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${ARCH}.zip /

Note that the example above is written in a way that if a user passes their own --build-arg ARCH=value, then that value is used as-is.

Now, let’s look at how new expansions can be useful in multi-stage builds.

One of the techniques described in “Advanced multi-stage build patterns” shows how build arguments can be used so that different Dockerfile commands run depending on the build-arg value. For example, you can use that pattern if you build a multi-platform image and want to run additional COPY or RUN commands only for specific platforms. If this method is new to you, you can learn more about it from that post.

In summarized form, the idea is to define a global build argument and then define build stages that use the build argument value in the stage name while pointing to the base of your target stage via the build-arg name.

Old example:

ARG BUILD_VERSION=1

FROM alpine AS base
RUN …

FROM base AS branch-version-1
RUN touch version1

FROM base AS branch-version-2
RUN touch version2

FROM branch-version-${BUILD_VERSION} AS after-condition

FROM after-condition
RUN …

When using this pattern for multi-platform builds, one of the limitations is that all the possible values for the build-arg need to be defined by your Dockerfile. This is problematic as we want Dockerfile to be built in a way that it can build on any platform and not limit it to a specific set. 

You can see other examples here and here of Dockerfiles where dummy stage aliases must be defined for all architectures, and no other architecture can be built. Instead, the pattern we would like to use is that there is one architecture that has a special behavior, and everything else shares another common behavior.

With new expansions, we can write this to demonstrate running special commands only on RISC-V, which is still somewhat new and may need custom behavior:

#syntax=docker/dockerfile:1.7

ARG ARCH=${TARGETARCH#riscv64}
ARG ARCH=${ARCH:+"common"}
ARG ARCH=${ARCH:-$TARGETARCH}

FROM --platform=$BUILDPLATFORM alpine AS base-common
ARG TARGETARCH
RUN echo "Common build, I am $TARGETARCH" > /out

FROM --platform=$BUILDPLATFORM alpine AS base-riscv64
ARG TARGETARCH
RUN echo "Riscv only special build, I am $TARGETARCH" > /out

FROM base-${ARCH} AS base

Let’s look at these ARCH definitions more closely.

  • The first sets ARCH to TARGETARCH but removes riscv64 from the value.
  • Next, as we described previously, we don’t actually want the other architectures to use their own values but instead want them all to share a common value. So, we set ARCH to common except if it was cleared from the previous riscv64 rule. 
  • Now, if we still have an empty value, we default it back to $TARGETARCH.
  • The last definition is optional, as we would already have a unique value for both cases, but it makes the final stage name base-riscv64 nicer to read.

Additional examples of including multiple conditions with shared conditions, or conditions based on architecture variants can be found in this GitHub Gist page.

Comparing this example to the initial example of conditions between stages, the new pattern isn’t limited to just controlling the platform differences of your builds but can be used with any build-arg. If you have used this pattern before, then you can effectively now define an “else” clause, whereas previously, you were limited to only “if” clauses.

Copy with keeping parent directories

The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature.

#syntax=docker/dockerfile:1.7-labs

When you are copying files in your Dockerfile, for example, do this:

COPY app/file /to/dest/dir/

This example means the source file is copied directly to the destination directory. If your source path was a directory, all the files inside that directory would be copied directly to the destination path.

What if you have a file structure like the following:

.
├── app1
│   ├── docs
│   │   └── manual.md
│   └── src
│       └── server.go
└── app2
    └── src
        └── client.go

You want to copy only files in app1/src, but so that the final files at the destination would be /to/dest/dir/app1/src/server.go and not just /to/dest/dir/server.go.

With the new COPY --parents flag, you can write:

COPY --parents /app1/src/ /to/dest/dir/  

This will copy the files inside the src directory and recreate the app1/src directory structure for these files.

Things get more powerful when you start to use wildcard paths. To copy the src directories for both apps into their respective locations, you can write:

COPY --parents */src/ /to/dest/dir/ 

This will create both /to/dest/dir/app1 and /to/dest/dir/app2, but it will not copy the docs directory. Previously, this kind of copy was not possible with a single command. You would have needed multiple copies for individual files (as shown in this example) or used some workaround with the RUN --mount instruction instead.

You can also use double-star wildcard (**) to match files under any directory structure. For example, to copy only the Go source code files anywhere in your build context, you can write:

COPY --parents **/*.go /to/dest/dir/

If you are thinking about why you would need to copy specific files instead of just using COPY ./ to copy all files, remember that your build cache gets invalidated when you include new files in your build. If you copy all files, the cache gets invalidated when any file is added or changed, whereas if you copy only Go files, only changes in these files influence the cache.

The new --parents flag is not only for COPY instructions from your build context, but obviously, you can also use them in multi-stage builds when copying files between stages using COPY --from

Note that with COPY --from syntax, all source paths are expected to be absolute, meaning that if the --parents flag is used with such paths, they will be fully replicated as they were in the source stage. That may not always be desirable, and instead, you may want to keep some parents but discard and replace others. In that case, you can use a special /./ relative pivot point in your source path to mark which parents you wish to copy and which should be ignored. This special path component resembles how rsync works with the --relative flag.

#syntax=docker/dockerfile:1.7-labs
FROM ... AS base
RUN ./generate-lot-of-files -o /out/
# /out/usr/bin/foo
# /out/usr/lib/bar.so
# /out/usr/local/bin/baz

FROM scratch
COPY --from=base --parents /out/./**/bin/ /
# /usr/bin/foo
# /usr/local/bin/baz

This example above shows how only bin directories are copied from the collection of files that the intermediate stage generated, but all the directories will keep their paths relative to the out directory. 

Exclusion filters

The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature:

#syntax=docker/dockerfile:1.7-labs

Another related case when moving files in your Dockerfile with COPY and ADD instructions is when you want to move a group of files but exclude a specific subset. Previously, your only options were to use RUN --mount or try to define your excluded files inside a .dockerignore file. 

.dockerignore files, however, are not a good solution for this problem, because they only list the files excluded from the client-side build context and not from builds from remote Git/HTTP URLs and are limited to one per Dockerfile. You should use them similarly to .gitignore to mark files that are never part of your project but not as a way to define your application-specific build logic.

With the new --exclude=[pattern] flag, you can now define such exclusion filters for your COPY and ADD commands directly in the Dockerfile. The pattern uses the same format as .dockerignore.

The following example copies all the files in a directory except Markdown files:

COPY --exclude=*.md app /dest/

You can use the flag multiple times to add multiple filters. The next example excludes Markdown files and also a file called README:

COPY --exclude=*.md --exclude=README app /dest/

Double-star wildcards exclude not only Markdown files in the copied directory but also in any subdirectory:

COPY --exclude=**/*.md app /dest/

As in .dockerignore files, you can also define exceptions to the exclusions with ! prefix. The following example excludes all Markdown files in any copied directory, except if the file is called important.md — in that case, it is still copied.

COPY --exclude=**/*.md --exclude=!**/important.md app /dest/

This double negative may be confusing initially, but note that this is a reversal of the previous exclude rule, and “include patterns” are defined by the source parameter of the COPY instruction.

When using --exclude together with previously described --parents copy mode, note that the exclude patterns are relative to the copied parent directories or to the pivot point /./ if one is defined. See the following directory structure for example:

assets
├── app1
│   ├── icons32x32
│   ├── icons64x64
│   ├── notes
│   └── backup
├── app2
│   └── icons32x32
└── testapp
    └── icons32x32
COPY --parents --exclude=testapp assets/./**/icons* /dest/

This command would create the directory structure below. Note that only directories with the icons prefix were copied, the root parent directory assets was skipped as it was before the relative pivot point, and additionally, testapp was not copied as it was defined with an exclusion filter.

dest
├── app1
│   ├── icons32x32
│   └── icons64x64
└── app2
    └── icons32x32

Conclusion

We hope this post gave you ideas for improving your Dockerfiles and that the patterns shown here will help you describe your build more efficiently. Remember that your Dockerfile can start using all these features today by defining the #syntax line on top, even if you haven’t updated to the latest Docker yet.

For a full list of other features in the new BuildKit, Buildx, and Dockerfile releases, check out the changelogs:

Thanks to community members @tstenner, @DYefimov, and @leandrosansilva for helping to implement these features!

If you have issues or suggestions you want to share, let us know in the issue tracker.

Learn more

]]>
Debian’s Dedication to Security: A Robust Foundation for Docker Developers https://www.docker.com/blog/debian-for-docker-developers/ Thu, 04 Apr 2024 14:03:10 +0000 https://www.docker.com/?p=53447 As security threats become more and more prevalent, building software with security top of mind is essential. Security has become an increasing concern for container workloads specifically and, commensurately, for container base-image choice. Many conversations around choosing a secure base image focus on CVE counts, but security involves a lot more than that. 

One organization that has been leading the way in secure software development is the Debian Project. In this post, I will outline how and why Debian operates as a secure basis for development.

White text on purple background with Docker logo and "Docker Official Images"

For more than 30 years, Debian’s diverse group of volunteers has provided a free, open, stable, and secure GNU/Linux distribution. Debian’s emphasis on engineering excellence and clean design, as well as its wide variety of packages and supported architectures, have made it not only a widely used distribution in its own right but also a meta-distribution. Many other Linux distributions, such as Ubuntu, Linux Mint, and Kali Linux, are built on top of Debian, as are many Docker Official Images (DOI). In fact, more than 1,000 Docker Official Images variants use the debian DOI or the Debian-derived ubuntu DOI as their base image. 

Why Debian?

As a bit of a disclaimer, I have been using Debian GNU/Linux for a long time. I remember installing Debian from floppy disks in the 1990s on a PC that I cobbled together, and later reinstalling so I could test prerelease versions of the netinst network installer. Installing over the network took a while using a 56-kbps modem. At those network speeds, you had to be very particular about which packages you chose in dselect

Having used a few other distributions before trying Debian, I still remember being amazed by how well-organized and architected the system was. No dangling or broken dependencies. No download failures. No incompatible shared libraries. No package conflicts, but rather a thoughtful handling of packages providing similar functionality. 

Much has changed over the years, no more floppies, dselect has been retired, my network connection speed has increased by a few orders of magnitude, and now I “install” Debian via docker pull debian. What has not changed is the feeling of amazement I have toward Debian and its community.

Open source software and security

Despite the achievements of the Debian project and the many other projects it has spawned, it is not without detractors. Like many other open source projects, Debian has received its share of criticsm in the past few years by opportunists lamenting the state of open source security. Writing about the software supply chain while bemoaning high-profile CVEs and pointing to malware that has been uploaded to an open source package ecosystem, such as PyPI or NPM, has become all too common. 

The pernicious assumption in such articles is that open source software is the problem. We know this is not the case. We’ve been through this before. Back when I was installing Debian over a 56-kbps modem, all sorts of fear, uncertainty, and doubt (FUD) was being spread by various proprietary software vendors. We learned then that open source is not a security problem — it is a security solution. 

Being open source does not automatically convey an improved security status compared to closed-source software, but it does provide significant advantages. In his Secure Programming HOWTO, David Wheeler provides a balanced summary of the relationship between open source software and security. A purported advantage conveyed by closed-source software is the nondisclosure of its source code, but we know that security through obscurity is no security at all. 

The transparency of open source software and open ecosystems allows us to better know our security posture. Openness allows for the rapid identification and remediation of vulnerabilities. Openness enables the vast majority of the security and supply chain tooling that developers regularly use. How many closed-source tools regularly publish CVEs? With proprietary software, you often only find out about a vulnerability after it is too late.

Debian’s rapid response strategy

Debian has been criticized for moving too slowly on the security front. But this narrative, like the open vs. closed-source narrative, captures neither the nuance nor reality. Although several distributions wait to publish CVEs until a fixed version is available, Debian opts for complete transparency and urgency when communicating security information to its users.

Furthermore, Debian maintainers are not a mindless fleet of automatons hastily applying patches and releasing new package versions. As a rule, Debian maintainers are experts among experts, deeply steeped in software and delivery engineering, open source culture, and the software they package.

zlib vulnerability example

A recent zlib vulnerability, CVE-2023-45853, provides an insightful example of the Debian project’s diligent, thorough approach to security. Several distributions grabbed a patch for the vulnerability, applied it, rebuilt, packaged, and released a new zlib package. The Debian security community took a closer look.

As mentioned in the CVE summary, the vulnerability was in minizip, which is a utility under the contrib directory of the zlib source code. No minizip source files are compiled into the zlib library, libz. As such, this vulnerability did not actually affect any zlib packages.

If that were where the story had ended, the only harm would be in updating a package unnecessarily. But the story did not end there. As detailed in the Debian bug thread, the offending minizip code was copied (i.e., vendored) and used in a lot of other widely used software. In fact, the vendored minizip code in both Chromium and Node.js was patched about a month before the zlib CVE was even published. 

Unfortunately, other commonly used software packages also had vendored copies of minizip that were still vulnerable. Thanks to the diligence of the Debian project, either the patch was applied to those projects as well, or they were compiled against the patched system minizip (not zlib!) dev package rather than the vendored version. In other distributions, those buggy vendored copies are in some cases still being compiled into software packages, with nary a mention in any CVE.

Thinking beyond CVEs

In the past 30 years, we have seen an astronomical increase in the role open source software plays in the tech industry. Despite the productivity gains that software engineers get by leveraging the massive amount of high-quality open source software available, we are once again hearing the same FUD we heard in the early days of open source. 

The next time you see an article about the dangers lurking in your open source dependencies, don’t be afraid to look past the headlines and question the assumptions. Open ecosystems lead to secure software, and the Debian project provides a model we would all do well to emulate. Debian’s goal is security, which encompasses a lot more than a report showing zero CVEs. Consumers of operating systems and container images would be wise to understand the difference. 

So go ahead and build on top of the debian DOI. FROM debian is never a bad way to start a Dockerfile!

Learn more

]]>
OpenSSH and XZ/liblzma: A Nation-State Attack Was Thwarted, What Did We Learn? https://www.docker.com/blog/openssh-and-xz-liblzma/ Mon, 01 Apr 2024 19:05:54 +0000 https://www.docker.com/?p=53505 Black padlock on light blue digital background

I have been recently watching The Americans, a decade-old TV series about undercover KGB agents living disguised as a normal American family in Reagan’s America in a paranoid period of the Cold War. I was not expecting this weekend to be reading mailing list posts of the same type of operation being performed on open source maintainers by agents with equally shadowy identities (CVE-2024-3094).

As The Grugq explains, “The JK-persona hounds Lasse (the maintainer) over multiple threads for many months. Fortunately for Lasse, his new friend and star developer is there, and even more fortunately, Jia Tan has the time available to help out with maintenance tasks. What luck! This is exactly the style of operation a HUMINT organization will run to get an agent in place. They will position someone and then create a crisis for the target, one which the agent is able to solve.”

The operation played out over two years, getting the agent in place, setting up the infrastructure for the attack, hiding it from various tools, and then rushing to get it into Linux distributions before some recent changes in systemd were shipped that would have stopped this attack from working.

An equally unlikely accident resulted when Andres Freund, a Postgres maintainer, discovered the attack before it had reached the vast majority of systems, from a probably accidental performance slowdown. Andres says, “I didn’t even notice it while logging in with SSH or such. I was doing some micro-benchmarking at the time and was looking to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd. Which showed lots of cpu time in code with perf unable to attribute it to a symbol, with the dso showing as liblzma. Got suspicious. Then I recalled that I had seen an odd valgrind complaint in my automated testing of Postgres, a few weeks earlier, after some package updates were installed. Really required a lot of coincidences.” 

It is hard to overstate how lucky we were here, as there are no tools that will detect this vulnerability. Even ex-post it is not possible to detect externally as we do not have the private key needed to trigger the vulnerability, and the code is very well hidden. While Linus’s law has been stated as “given enough eyeballs all bugs are shallow,” we have seen in the past this is not always true, or there are just not enough eyeballs looking at all the code we consume, even if this time it worked.

In terms of immediate actions, the attack appears to have been targeted at subset of OpenSSH servers patched to integrate with systemd. Running SSH servers in containers is rare, and the initial priority should be container hosts, although as the issue was caught early it is likely that few people updated. There is a stream of fixes to liblzma, the xz compression library where the exploit was placed, as the commits from the last two years are examined, although at present there is no evidence that there are exploits for any software other than OpenSSH included. In the Docker Scout web interface you can search for “lzma” in package names, and issues will be flagged in the “high profile vulnerabilities” policy.

So many commentators have simple technical solutions, and so many vendors are using this to push their tools. As a technical community, we want there to be technical solutions to problems like this. Vendors want to sell their products after events like this, even though none even detected it. Rewrite it in Rust, shoot autotools, stop using GitHub tarballs, and checked-in artifacts, the list goes on. These are not bad things to do, and there is no doubt that understandability and clarity are valuable for security, although we often will trade them off for performance. It is the case that m4 and autotools are pretty hard to read and understand, while tools like ifunc allow dynamic dispatch even in a mostly static ecosystem. Large investments in the ecosystem to fix these issues would be worthwhile, but we know that attackers would simply find new vectors and weird machines. Equally, there are many naive suggestions about the people, as if having an identity for open source developers would solve a problem, when there are very genuine people who wish to stay private while state actors can easily find fake identities, or “just say no” to untrusted people. Beware of people bringing easy solutions, there are so many in this hot-take world.

Where can we go from here? Awareness and observability first. Hyper awareness even, as we see in this case small clues matter. Don’t focus on the exact details of this attack, which will be different next time, but think more generally. Start by understanding your organization’s software consumption, supply chain, and critical points. Ask what you should be funding to make it different. Then build in resilience. Defense in depth, and diversity — not a monoculture. OpenSSH will always be a target because it is so widespread, and the OpenBSD developers are doing great work and the target was upstream of them because of this. But we need a diverse ecosystem with multiple strong solutions, and as an organization you need second suppliers for critical software. The third critical piece of security in this era is recoverability. Planning for the scenario in which the worst case has happened and understanding the outcomes and recovery process is everyone’s homework now, and making sure you are prepared with tabletop exercises around zero days. 

This is an opportunity for all of us to continue working together to strengthen the open source supply chain, and to work on resilience for when this happens next. We encourage dialogue and discussion on this within Docker communities.

Learn more

]]>
Is Your Container Image Really Distroless? https://www.docker.com/blog/is-your-container-image-really-distroless/ Wed, 27 Mar 2024 13:25:43 +0000 https://www.docker.com/?p=52629 Containerization helped drastically improve the security of applications by providing engineers with greater control over the runtime environment of their applications. However, a significant time investment is required to maintain the security posture of those applications, given the daily discovery of new vulnerabilities as well as regular releases of languages and frameworks. 

The concept of “distroless” images offers the promise of greatly reducing the time needed to keep applications secure by eliminating most of the software contained in typical container images. This approach also reduces the amount of time teams spend remediating vulnerabilities, allowing them to focus only on the software they are using. 

In this article, we explain what makes an image distroless, describe tools that make the creation of distroless images practical, and discuss whether distroless images live up to their potential.

2400x1260 is your image really distroless

What’s a distro?

A Linux distribution is a complete operating system built around the Linux kernel, comprising a package management system, GNU tools and libraries, additional software, and often a graphical user interface.

Common Linux distributions include Debian, Ubuntu, Arch Linux, Fedora, Red Hat Enterprise Linux, CentOS, and Alpine Linux (which is more common in the world of containers). These Linux distributions, like most Linux distros, treat security seriously, with teams working diligently to release frequent patches and updates to known vulnerabilities. A key challenge that all Linux distributions must face involves the usability/security dilemma. 

On its own, the Linux kernel is not very usable, so many utility commands are included in distributions to cover a large array of use cases. Having the right utilities included in the distribution without having to install additional packages greatly improves a distro’s usability. The downside of this increase in usability, however, is an increased attack surface area to keep up to date. 

A Linux distro must strike a balance between these two elements, and different distros have different approaches to doing so. A key aspect to keep in mind is that a distro that emphasizes usability is not “less secure” than one that does not emphasize usability. What it means is that the distro with more utility packages requires more effort from its users to keep it secure.

Multi-stage builds

Multi-stage builds allow developers to separate build-time dependencies from runtime ones. Developers can now start from a full-featured build image with all the necessary components installed, perform the necessary build step, and then copy only the result of those steps to a more minimal or even an empty image, called “scratch”. With this approach, there’s no need to clean up dependencies and, as an added bonus, the build stages are also cacheable, which can considerably reduce build time. 

The following example shows a Go program taking advantage of multi-stage builds. Because the Golang runtime is compiled into the binary, only the binary and root certificates need to be copied to the blank slate image.

FROM golang:1.21.5-alpine as build
WORKDIR /
COPY go.* .
RUN go mod download
COPY . .
RUN go build -o my-app


FROM scratch
COPY --from=build
  /etc/ssl/certs/ca-certificates.crt
  /etc/ssl/certs/ca-certificates.crt
COPY --from=build /my-app /usr/local/bin/my-app
ENTRYPOINT ["/usr/local/bin/my-app"]

BuildKit

BuildKit, the current engine used by docker build, helps developers create minimal images thanks to its extensible, pluggable architecture. It provides the ability to specify alternative frontends (with the default being the familiar Dockerfile) to abstract and hide the complexity of creating distroless images. These frontends can accept more streamlined and declarative inputs for builds and can produce images that contain only the software needed for the application to run. 

The following example shows the input for a frontend for creating Python applications called mopy by Julian Goede.

#syntax=cmdjulian/mopy
apiVersion: v1
python: 3.9.2
build-deps:
  - libopenblas-dev
  - gfortran
  - build-essential
envs:
  MYENV: envVar1
pip:
  - numpy==1.22
  - slycot
  - ./my_local_pip/
  - ./requirements.txt
labels:
  foo: bar
  fizz: ${mopy.sbom}
project: my-python-app/

So, is your image really distroless?

Thanks to new tools for creating container images like multi-stage builds and BuildKit, it is now a lot more practical to create images that only contain the required software and its runtime dependencies. 

However, many images claiming to be distroless still include a shell (usually Bash) and/or BusyBox, which provides many of the commands a Linux distribution does — including wget — that can leave containers vulnerable to Living off the land (LOTL) attacks. This raises the question, “Why would an image trying to be distroless still include key parts of a Linux distribution?” The answer typically involves container initialization. 

Developers often have to make their applications configurable to meet the needs of their users. Most of the time, those configurations are not known at build time so they need to be configured at run time. Often, these configurations are applied using shell initialization scripts, which in turn depend on common Linux utilities such as sed, grep, cp, etc. When this is the case, the shell and utilities are only needed for the first few seconds of the container’s lifetime. Luckily, there is a way to create true distroless images while still allowing initialization using tools available from most container orchestrators: init containers.

Init containers

In Kubernetes, an init container is a container that starts and must complete successfully before the primary container can start. By using a non-distroless container as an init container that shares a volume with the primary container, the runtime environment and application can be configured before the application starts. 

The lifetime of that init container is short (often just a couple seconds), and it typically doesn’t need to be exposed to the internet. Much like multi-stage builds allow developers to separate the build-time dependencies from the runtime dependencies, init containers allow developers to separate initialization dependencies from the execution dependencies. 

The concept of init container may be familiar if you are using relational databases, where an init container is often used to perform schema migration before a new version of an application is started.

Kubernetes example

Here are two examples of using init containers. First, using Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: kubecon-postgress-pod
  labels:
    app.kubernetes.io/name: KubeConPostgress
spec:
  containers:
  - name: postgress
    image: laurentgoderre689/postgres-distroless
    securityContext:
      runAsUser: 70
      runAsGroup: 70
    volumeMounts:
    - name: db
      mountPath: /var/lib/postgresql/data/
  initContainers:
  - name: init-postgress
    image: postgres:alpine3.18
    env:
      - name: POSTGRES_PASSWORD
        valueFrom:
          secretKeyRef:
            name: kubecon-postgress-admin-pwd
            key: password
    command: ['docker-ensure-initdb.sh']
    volumeMounts:
    - name: db
      mountPath: /var/lib/postgresql/data/
  volumes:
  - name: db
    emptyDir: {}

- - - 

> kubectl apply -f pod.yml && kubectl get pods
pod/kubecon-postgress-pod created
NAME                    READY   STATUS     RESTARTS   AGE
kubecon-postgress-pod   0/1     Init:0/1   0          0s
> kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
kubecon-postgress-pod   1/1     Running   0          10s

Docker Compose example

The init container concept can also be emulated in Docker Compose for local development using service dependencies and conditions.

services:
 db:
   image: laurentgoderre689/postgres-distroless
   user: postgres
   volumes:
     - pgdata:/var/lib/postgresql/data/
   depends_on:
     db-init:
       condition: service_completed_successfully

 db-init:
   image: postgres:alpine3.18
   environment:
      POSTGRES_PASSWORD: example
   volumes:
     - pgdata:/var/lib/postgresql/data/
   user: postgres
    command: docker-ensure-initdb.sh

volumes:
 pgdata:

- - - 
> docker-compose up 
[+] Running 4/0
 ✔ Network compose_default      Created                                                                                                                      
 ✔ Volume "compose_pgdata"      Created                                                                                                                     
 ✔ Container compose-db-init-1  Created                                                                                                                      
 ✔ Container compose-db-1       Created                                                                                                                      
Attaching to db-1, db-init-1
db-init-1  | The files belonging to this database system will be owned by user "postgres".
db-init-1  | This user must also own the server process.
db-init-1  | 
db-init-1  | The database cluster will be initialized with locale "en_US.utf8".
db-init-1  | The default database encoding has accordingly been set to "UTF8".
db-init-1  | The default text search configuration will be set to "english".
db-init-1  | [...]
db-init-1 exited with code 0
db-1       | 2024-02-23 14:59:33.191 UTC [1] LOG:  starting PostgreSQL 16.1 on aarch64-unknown-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit
db-1       | 2024-02-23 14:59:33.191 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db-1       | 2024-02-23 14:59:33.191 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db-1       | 2024-02-23 14:59:33.194 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db-1       | 2024-02-23 14:59:33.196 UTC [9] LOG:  database system was shut down at 2024-02-23 14:59:32 UTC
db-1       | 2024-02-23 14:59:33.198 UTC [1] LOG:  database system is ready to accept connections

As demonstrated by the previous example, an init container can be used alongside a container to remove the need for general-purpose software and allow the creation of true distroless images. 

Conclusion

This article explained how Docker build tools allow for the separation of build-time dependencies from run-time dependencies to create “distroless” images. For example, using init containers allows developers to separate the logic needed to configure a runtime environment from the environment itself and provide a more secure container. This approach also helps teams focus their efforts on the software they use and find a better balance between security and usability.

Learn more

]]>
Is Your Image Really Distroless? - Laurent Goderre, Docker nonadult
Docker Security Advisory: Multiple Vulnerabilities in runc, BuildKit, and Moby https://www.docker.com/blog/docker-security-advisory-multiple-vulnerabilities-in-runc-buildkit-and-moby/ Wed, 31 Jan 2024 20:05:23 +0000 https://www.docker.com/?p=51378 February 1 updates:

January 31 updates:

  • Patches for runc, BuildKit, and Moby (Docker Engine) are now available.
  • Updates have been rolled out to Docker Build Cloud builders.

We at Docker prioritize the security and integrity of our software and the trust of our users. Security researchers at Snyk Labs recently identified and reported four security vulnerabilities in the container ecosystem. One of the vulnerabilities, CVE-2024-21626, concerns the runc container runtime, and the other three affect BuildKit (CVE-2024-23651, CVE-2024-23652, and CVE-2024-23653). We want to assure our community that our team, in collaboration with the reporters and open source maintainers, has been diligently working on coordinating and implementing necessary remediations.

banner docker security advisory

We are committed to maintaining the highest security standards. We will publish patched versions of runc, BuildKit, and Moby on January 31 and release an update for Docker Desktop on February 1 to address these vulnerabilities.  Additionally, our latest Moby and BuildKit releases will include fixes for CVE-2024-23650 and CVE-2024-24557, discovered respectively by an independent researcher and through Docker’s internal research initiatives.

 Versions impacted
runc<= 1.1.11
BuildKit<= 0.12.4
Moby (Docker Engine)<= 25.0.1 and <= 24.0.8
Docker Desktop<= 4.27.0

These vulnerabilities can only be exploited if a user actively engages with malicious content by incorporating it into the build process or running a container from a suspect image (particularly relevant for the CVE-2024-21626 container escape vulnerability). Potential impacts include unauthorized access to the host filesystem, compromising the integrity of the build cache, and, in the case of CVE-2024-21626, a scenario that could lead to full container escape. 

We strongly urge all customers to prioritize security by applying the available updates as soon as they are released. Timely application of these updates is the most effective measure to safeguard your systems against these vulnerabilities and maintain a secure and reliable Docker environment.

What should I do if I’m on an affected version?

If you are using affected versions of runc, BuildKit, Moby, or Docker Desktop, make sure to update to the latest versions as soon as patched versions become available (all to be released no later than February 1 and linked in the following table):

 Patched versions
runc>= 1.1.12
BuildKit>= 0.12.5
Moby (Docker Engine)>= 25.0.2 and >= 24.0.9*
Docker Desktop>= 4.27.1
* Only CVE-2024-21626 and CVE-2024-24557 were fixed in Moby 24.0.9.


If you are unable to update to an unaffected version promptly after it is released, follow these best practices to mitigate risk: 

  • Only use trusted Docker images (such as Docker Official Images).
  • Don’t build Docker images from untrusted sources or untrusted Dockerfiles.
  • If you are a Docker Business customer using Docker Desktop and unable to update to v4.27.1 immediately after it’s released, make sure to enable Hardened Docker Desktop features such as:
  • For CVE-2024-23650, CVE-2024-23651, CVE-2024-23652, and CVE-2024-23653, avoid using BuildKit frontend from an untrusted source. A frontend image is usually specified as the #syntax line on your Dockerfile, or with --frontend flag when using the buildctl build command.
  • To mitigate CVE-2024-24557, make sure to either use BuildKit or disable caching when building images. From the CLI this can be done via the DOCKER_BUILDKIT=1 environment variable (default for Moby >= v23.0 if the buildx plugin is installed) or the --no-cache flag. If you are using the HTTP API directly or through a client, the same can be done by setting nocache to true or version to 2 for the /build API endpoint.

Technical details and impact

CVE-2024-21626 (High)

In runc v1.1.11 and earlier, due to certain leaked file descriptors, an attacker can gain access to the host filesystem by causing a newly-spawned container process (from runc exec) to have a working directory in the host filesystem namespace, or by tricking a user to run a malicious image and allow a container process to gain access to the host filesystem through runc run. The attacks can also be adapted to overwrite semi-arbitrary host binaries, allowing for complete container escapes. Note that when using higher-level runtimes (such as Docker or Kubernetes), this vulnerability can be exploited by running a malicious container image without additional configuration or by passing specific workdir options when starting a container. The vulnerability can also be exploited from within Dockerfiles in the case of Docker.

  • The issue has been fixed in runc v1.1.12.

CVE-2024-23651 (High)

In BuildKit <= v0.12.4, two malicious build steps running in parallel sharing the same cache mounts with subpaths could cause a race condition, leading to files from the host system being accessible to the build container. This will only occur if a user is trying to build a Dockerfile of a malicious project.

  • The issue will be fixed in BuildKit v0.12.5.

CVE-2024-23652 (High)

In BuildKit <= v0.12.4, a malicious BuildKit frontend or Dockerfile using RUN --mount could trick the feature that removes empty files created for the mountpoints into removing a file outside the container from the host system. This will only occur if a user is using a malicious Dockerfile.

  • The issue will be fixed in BuildKit v0.12.5.

CVE-2024-23653 (High)

In addition to running containers as build steps, BuildKit also provides APIs for running interactive containers based on built images. In BuildKit <= v0.12.4, it is possible to use these APIs to ask BuildKit to run a container with elevated privileges. Normally, running such containers is only allowed if special security.insecure entitlement is enabled both by buildkitd configuration and allowed by the user initializing the build request.

  • The issue will be fixed in BuildKit v0.12.5.

CVE-2024-23650 (Medium)

In BuildKit <= v0.12.4, a malicious BuildKit client or frontend could craft a request that could lead to BuildKit daemon crashing with a panic.

  • The issue will be fixed in BuildKit v0.12.5.

CVE-2024-24557 (Medium)

In Moby <= v25.0.1 and <= v24.0.8, the classic builder cache system is prone to cache poisoning if the image is built FROM scratch. Also, changes to some instructions (most important being HEALTHCHECK and ONBUILD) would not cause a cache miss. An attacker with knowledge of the Dockerfile someone is using could poison their cache by making them pull a specially crafted image that would be considered a valid cache candidate for some build steps.

  • The issue will be fixed in Moby >= v25.0.2 and >= v24.0.9.

How are Docker products affected? 

The following Docker products are affected. No other products are affected by these vulnerabilities.

Docker Desktop

Docker Desktop v4.27.0 and earlier are affected. Docker Desktop v4.27.1 will be released on February 1 and includes runc, BuildKit, and dockerd binaries patches. In addition to updating to this new version, we encourage all Docker users to diligently use Docker images and Dockerfiles and ensure you only use trusted content in your builds.

As always, you should check Docker Desktop system requirements for your operating system (Windows, Linux, Mac) before updating to ensure full compatibility.

Docker Build Cloud

Any new Docker Build Cloud builder instances will be provisioned with the latest Docker Engine and BuildKit versions after fixes are released and will, therefore, be unaffected by these CVEs. Docker will also be rolling out gradual updates to any existing builder instances.

Security at Docker

At Docker, we know that part of being developer-obsessed is providing secure software to developers. We appreciate the responsible disclosure of these vulnerabilities. If you’re aware of potential security vulnerabilities in any Docker product, report them to security@docker.com. For more information on Docker’s security practices, see our website.

Advisory links

]]>
Generating SBOMs for Your Image with BuildKit https://www.docker.com/blog/generate-sboms-with-buildkit/ Tue, 24 Jan 2023 15:00:00 +0000 https://www.docker.com/?p=39978 Learn how to use BuildKit to generate SBOMs for your images and packages.

The latest release series of BuildKit, v0.11, introduces support for build-time attestations and SBOMs, allowing publishers to create images with records of how the image was built. This makes it easier for you to answer common questions, like which packages are in the image, where the image was built from, and whether you can reproduce the same results locally.

This new data helps you make informed decisions about the security of the images you consume — without needing to do all the manual work yourself.

In this blog post, we’ll discuss what attestations and SBOMs are, how to build images that contain SBOMs, and how to start analyzing the resulting data!

What are attestations?

An attestation is a declaration that a statement is true. With software, an attestation is a record that specifies a statement about a software artifact. For example, it could include who built it and when, what inputs it was built with, what outputs it produced, etc.

By writing these attestations, and distributing them alongside the artifacts themselves, you can see these details that might otherwise be tricky to find. To get this kind of information without attestations, you’d have to try and reverse-engineer how the image was built by trying to locate the source code and even attempting to reproduce the build yourself.

To provide this valuable information to the end-users of your images, BuildKit v0.11 lets you build these attestations as part of your normal build process. All it takes is adding a few options to your build step.

BuildKit supports attestations in the in-toto format (from the in-toto framework). Currently, the Dockerfile frontend produces two types of attestations that answer two different questions:

  • SBOM (Software Bill of Materials) – An SBOM contains a list of software components inside an image. This will include the names of various packages installed, their version numbers, and any other associated metadata. You can use this to see, at a glance, if an image contains a specific package or determine if an image is vulnerable to specific CVEs.
  • SLSA Provenance – The Provenance of the image describes details of the build process, such as what materials (like, images, URLs, files, etc.) were consumed, what build parameters were set, as well as source maps that allow mapping the resulting image back to the Dockerfile that created it. You can use this to analyze how an image was built, determine whether the sources consumed all appear legitimate, and even attempt to rebuild the image yourself.

Users can also define their own custom attestation types via a custom BuildKit frontend. In this post, we’ll focus on SBOMs and how to use them with Dockerfiles.

Getting the latest release

Building attestations into your images requires the latest releases of both Buildx and BuildKit – you can get the latest versions by updating Docker Desktop to the most recent version.

You can check your version number, and ensure it matches the buildx v0.10 release series:

$ docker buildx version
github.com/docker/buildx 0.10.0 ...

To use the latest release of BuildKit, create a docker-container builder using buildx:

$ docker buildx create --use --name=buildkit-container --driver=docker-container

You can check that the new builder is configured correctly, and ensure it matches the buildkit v0.11 release series:

$ docker buildx inspect | grep -i buildkit
Buildkit:  v0.11.1

If you’re using the docker/setup-buildx-action in GitHub Actions, then you’ll get this all automatically without needing to update.

With that out of the way, you can move on to building an image containing SBOMs!

Adding SBOMs to your images

You’re now ready to generate an SBOM for your image!

Let’s start with the following Dockerfile to create an nginx web server:

# syntax=docker/dockerfile:1.5

FROM nginx:latest
COPY ./static /usr/share/nginx/html

You can build and push this image, along with its SBOM, in one step:

$ docker buildx build --sbom=true -t <myorg>/<myimage> --push .

That’s all you need! In your build output, you should spot a message about generating the SBOM:

...
=> [linux/amd64] generating sbom using docker.io/docker/buildkit-syft-scanner:stable-1                           	0.2s
...

BuildKit generates SBOMs using scanner plugins. By default, it uses buildkit-syft-scanner, a scanner built on top of Anchore’s Syft open-source project, to do the heavy lifting. If you like, you can use another scanner by specifying the generator= option. 

Here’s how you view the generated SBOM using buildx imagetools:

$ docker buildx imagetools inspect <myorg>/<myimage> --format "{{ json .SBOM.SPDX }}"
{
	"spdxVersion": "SPDX-2.3",
	"dataLicense": "CC0-1.0",
	"SPDXID": "SPDXRef-DOCUMENT",
	"name": "/run/src/core/sbom",
	"documentNamespace": "https://anchore.com/syft/dir/run/src/core/sbom-a589a536-b5fb-49e8-9120-6a12ce988b67",
	"creationInfo": {
	"licenseListVersion": "3.18",
	"creators": [
	"Organization: Anchore, Inc",
	"Tool: syft-v0.65.0",
	"Tool: buildkit-v0.11.0"
	],
	"created": "2023-01-05T16:13:17.47415867Z"
	},
	...

SBOMs also work with the local and tar exporters. When you export with these exporters, instead of attaching the attestations directly to the output image, the attestations are exported as separate files into the output filesystem:

$ docker buildx build --sbom=true -o ./image .
$ ls -lh ./image
-rw-------  1 user user 6.5M Jan 17 14:36 sbom.spdx.json
...

Viewing the SBOM in this case is as simple as cat-ing the result:

$ cat ./image/sbom.spdx.json | jq .predicate
{
	"spdxVersion": "SPDX-2.3",
	"dataLicense": "CC0-1.0",
	"SPDXID": "SPDXRef-DOCUMENT",
	…

Supplementing SBOMs

Generating SBOMs using a scanner is a good first start! But some packages won’t be correctly detected because they’ve been installed in a slightly unconventional way.

If that’s the case, you can still get this information into your SBOMs with a bit of manual interaction.

Let’s suppose you’ve installed foo v1.2.3 into your image by downloading it using curl:

RUN curl https://example.com/releases/foo-v1.2.3-amd64.tar.gz | tar xzf - && \
    mv foo /usr/local/bin/

Software installed this way likely won’t appear in your SBOM unless the SBOM generator you’re using has special support for this binary (for example, Syft has support for detecting certain known binaries).

You can manually generate an SBOM for this piece of software by writing an SPDX snippet to a location of your choice on the image filesystem using a Dockerfile heredoc:

COPY /usr/local/share/sbom/foo.spdx.json <<"EOT"
{
	"spdxVersion": "SPDX-2.3",
	"SPDXID": "SPDXRef-DOCUMENT",
	"name": "foo-v1.2.3",
	...
}
EOT

This SBOM should then be picked up by your SBOM generator and included in the final SBOM for the whole image. This behavior is included out-of-the-box in buildkit-syft-scanner, but may not be part of every generator’s toolkit.

Even more SBOMs!

While the above section is good for scanning a basic image, it might struggle to provide more detailed package and file information. BuildKit can help you scan additional components of your build, including intermediate stages and your build context using the BUILDKIT_SBOM_SCAN_STAGE and BUILDKIT_SBOM_SCAN_CONTEXT arguments respectively.

In the case of multi-stage builds, this allows you to track dependencies from previous stages, even though that software might not appear in your final image.

For example, for a demo C/C++ program, you might have the following Dockerfile:

# syntax=docker/dockerfile:1.5

FROM ubuntu:22.04 AS build
ARG BUILDKIT_SBOM_SCAN_STAGE=true
RUN apt-get update && apt-get install -y git build-essential
WORKDIR /src
RUN git clone https://example.com/myorg/myrepo.git .
RUN make build

FROM scratch
COPY --from=build /src/build/ /

If you just scanned the resulting image, it wouldn’t reveal that the build tools, like Git or GCC (included in the build-essential package), were ever used in the build process! By integrating SBOM scanning into your build using the BUILDKIT_SBOM_SCAN_STAGE build argument, you can get much richer information that would otherwise have been completely lost.

You can access these additionally generated SBOM documents in imagetools as well:

$ docker buildx imagetools inspect <myorg>/<myimage> --format "{{ range .SBOM.AdditionalSPDXs }}{{ json . }}{{ end }}"
{
	"spdxVersion": "SPDX-2.3",
	"SPDXID": "SPDXRef-DOCUMENT",
	...
}
{
	"spdxVersion": "SPDX-2.3",
	"SPDXID": "SPDXRef-DOCUMENT",
	...
}
...

For the local and tar exporters, these will appear as separate files in your output directory:

$ docker buildx build --sbom=true -o ./image .
$ ls -lh ./image
-rw------- 1 user user 4.3M Jan 17 14:40 sbom-build.spdx.json
-rw------- 1 user user  877 Jan 17 14:40 sbom.spdx.json
...

Analyzing images

Now that you’re publishing images containing SBOMs, it’s important to find a way to analyze them to take advantage of this additional data.

As mentioned above, you can extract the attached SBOM attestation using the imagetools subcommand:

$ docker buildx imagetools inspect <myorg>/<myimage> --format "{{json .SBOM.SPDX}}"
{
	"spdxVersion": "SPDX-2.3",
	"dataLicense": "CC0-1.0",
	"SPDXID": "SPDXRef-DOCUMENT",
	...

If your target image is built for multiple architectures using the --platform flag, then you’ll need a slightly different syntax to extract the SBOM attestation:

$ docker buildx imagetools inspect <myorg>/<myimage> --format "{{ json (index .SBOM "linux/amd64").SPDX}}"
{
	"spdxVersion": "SPDX-2.3",
	"dataLicense": "CC0-1.0",
	"SPDXID": "SPDXRef-DOCUMENT",
	...

Now suppose you want to list all of the packages, and their versions, inside an image. You can modify the value passed to the --format flag to be a go template that lists the packages:

$ docker buildx imagetools inspect <myorg>/<myimage> --format '{{ range .SBOM.SPDX.packages }}{{ println .name .versionInfo }}{{ end }}' | sort
adduser 3.118
apt 2.2.4
base-files 11.1+deb11u6
base-passwd 3.5.51
bash 5.1-2+deb11u1
bsdutils 1:2.36.1-8+deb11u1
ca-certificates 20210119
coreutils 8.32-4+b1
curl 7.74.0-1.3+deb11u3
...

Alternatively, you might want to get the version information for a piece of software that you know is installed:

$ docker buildx imagetools inspect <myorg>/<myimage> --format '{{ range .SBOM.SPDX.packages }}{{ if eq .name "nginx" }}{{ println .versionInfo }}{{ end }}{{ end }}'
1.23.3-1~bullseye

You can even take the whole SBOM and use it to scan for CVEs using a tool that can use SBOMs to search for CVEs (like Anchore’s Grype):

$ docker buildx imagetools inspect <myorg>/<myimage> --format '{{ json .SBOM.SPDX }}' | grype
NAME          	INSTALLED            	FIXED-IN 	TYPE  VULNERABILITY 	SEVERITY   
apt           	2.2.4                             	deb   CVE-2011-3374 	Negligible  
bash          	5.1-2+deb11u1        	(won't fix) deb   CVE-2022-3715 	 
...

These operations should complete super quickly! Because the SBOM was already generated at build, you’re just querying already-existing data from Docker Hub instead of needing to generate it from scratch every time.

Going further

In this post, we’ve only covered the absolute basics to getting started with BuildKit and SBOMs — you can find out more about the things we’ve talked about on docs.docker.com:

And you can find out more about other features released in the latest BuildKit v0.11 release here.

]]>
Announcing Docker Hub OCI Artifacts Support https://www.docker.com/blog/announcing-docker-hub-oci-artifacts-support/ Mon, 31 Oct 2022 16:00:00 +0000 https://www.docker.com/?p=38556 We’re excited to announce that Docker Hub can now help you distribute any type of application artifact! You can now keep everything in one place without having to leverage multiple registries.

Before today, you could only use Docker Hub to store and distribute container images — or artifacts usable by container runtimes. This became a limitation of our platform, since container image distribution is just the tip of the application delivery iceberg. Nowadays, modern application delivery requires numerous types of artifacts:

Developers often share these with clients that need them since they add immense value to each project. And while the OCI working groups are busy releasing the latest OCI Artifact Specification, we still have to package application artifacts as OCI images in the meantime. 

Docker Hub acts as an image registry and is perfectly suited for distributing application artifacts. That’s why we’ve added support for any software artifact — packaged as an OCI image — to Docker Hub.

What’s the Open Container Initiative (OCI)?

Back in 2015, we helped establish the Open Container Initiative as an open governance structure to standardize container image formats, container runtimes, and image distribution.

The OCI maintains a few core specifications. These govern the following:

  • How to package filesystem bundles
  • How to launch containerized, cross-platform apps
  • How to make packaged content accessible to remote clients

The Runtime Specification determines how OCI images and runtimes interact. Next, the Image Specification outlines how to create OCI images. Finally, the Distribution Specification defines how to make content distribution interoperable.

The OCI’s overall aim is to boost transparency, runtime predictability, software compatibility, and distribution. We’ve since donated our own container format and runC OCI-compliant runtime to the OCI, plus given the OCI-compliant distribution project to the CNCF.

Why are we adding OCI support? 

Container images are integral to supporting your containerized application builds. We know that images accumulate between projects, making centralized cloud storage essential to efficiently manage resources. Developers shouldn’t have to rely on local storage or wonder if these resources are readily accessible. However, we also know that developers want to store a variety of artifacts within Docker Hub. 

Storing your artifacts in Docker Hub unlocks “anywhere access” while also enabling improved collaboration through Docker Hub’s standard sharing capabilities. This aligns us more closely with the OCI’s content distribution mission by giving users greater control over key pieces of application delivery.

How do I manage different OCI artifacts?

We recommend using dedicated tools to help manage non-container OCI artifacts, like the Helm CLI for Helm charts or the OCI Registry-as-Storage (ORAS) CLI for arbitrary content types.

Let’s walk through a few use cases to showcase OCI support in Docker Hub.

Working with Helm charts

Helm chart support was your most-requested feature, and we’ve officially added it to Docker Hub! So, how do you take advantage? We’ll create a simple Helm chart and push it to Docker Hub. This process will follow Helm’s official guide for storing Helm charts as OCI images in registries.

First, we’ll create a demo Helm chart:

$ helm create demo

This’ll generate a familiar Helm chart boilerplate of files that you can edit:

demo
├── Chart.yaml
├── charts
├── templates
│   ├── NOTES.txt
│   ├── _helpers.tpl
│   ├── deployment.yaml
│   ├── hpa.yaml
│   ├── ingress.yaml
│   ├── service.yaml
│   ├── serviceaccount.yaml
│   └── tests
│   	└── test-connection.yaml
└── values.yaml

3 directories, 10 files

Once we’re done editing, we’ll need to package the Helm chart as an OCI image:

$ helm package demo

Successfully packaged chart and saved it to: /Users/martine/tmp/demo-0.1.0.tgz

Don’t forget to log into Docker Hub before pushing your Helm chart. We recommend creating a Personal Access Token (PAT) for this. You can export your PAT via an environment variable, and login, as follows:

$ echo $REG_PAT | helm registry login registry-1.docker.io -u martine --password-stdin

Pushing your Helm chart

You’re now ready to push your first Helm chart to Docker Hub! But first, make sure you have write access to your Helm chart’s destination namespace. In this example, let’s push to the docker namespace:

$ helm push demo-0.1.0.tgz oci://registry-1.docker.io/docker

Pushed: registry-1.docker.io/docker/demo:0.1.0
Digest: sha256:1e960ad1693c234b66ec1f9ddce80986cbf7159d2bb1e9a6d2c2cd6e89925e54

Viewing your Helm chart and using filters

Now, If you log in to Docker Hub and navigate to the demo repository detail, you’ll find your Helm chart in the list of repository tags:

Helm Type Docker Hub

You can navigate to the Helm chart page by clicking on the tag. The page displays useful Helm CLI commands:

Helm CLI Commands

Repository content management is now easier. We’ve improved content discoverability by adding a drop-down button to quickly filter the repository list by content type. Simply click the Content drop-down and select Helm from the list:

Helm Type Selection

Working with volumes

Developers use volumes throughout the Docker ecosystem to share arbitrary application data like database files. You can already back up your volumes using the Volume Backup & Share extension that we recently launched. You can now also filter repositories to find those containing volumes using the same drop-down menu.

But until Volumes Backup & Share pushes volumes as OCI artifacts instead of images (coming soon!), you can use the ORAS CLI to push volumes.

Note: We recommend ORAS CLI versions 0.15 or later since these bring full OCI registry client functionality.

Let’s walk through a simple use case that mirrors the examples documented by the ORAS CLI. First, we’ll create a simple file we want to package as a volume:

$ echo "bar" > foo.txt

For Docker Hub to recognize this volume, we must attach a config file to the OCI image upon creation and mark it with a specific media type. The file can contain arbitrary content, so let’s create one:

$ echo "{\"name\":\"foo\",\"value\":\"bar\"}" > config.json

With this step completed, you’re now ready to push your volume.

Pushing your volume

Here’s where the magic happens. The media type Docker Hub needs to successfully recognize the OCI image as a volume is application/vnd.docker.volume.v1+tar.gz. You can attach the media type to the config file and push it to Docker Hub with the following command (plus its resulting output):

$ oras push registry-1.docker.io/docker/demo:0.0.1 --config config.json:application/vnd.docker.volume.v1+tar.gz foo.txt:text/plain

Uploading b5bb9d8014a0 foo.txt
Uploaded  b5bb9d8014a0 foo.txt
Pushed registry-1.docker.io/docker/demo:0.0.1
Digest: sha256:f36eddbab8459d0ad1436b7ca8af6bfc512ec74f45d8136b53c16db87562016e

We now have two types of content in the demo repository as shown in the following breakdown:

Volume Content Type List

If you navigate to the content page, you’ll see some basic information that we’ll expand upon in future iterations. This will boost visibility into a volume’s contents.

Volume Details

Handling generic content types

If you don’t use the application/vnd.docker.volume.v1+tar.gz media type when pushing the volume with the ORAS CLI, Docker Hub will mark the artifact as generic to distinguish it from recognized content.

Let’s push the same volume but use application/vnd.random.volume.v1+tar.gz media type instead of the one known to Docker Hub:

$ oras push registry-1.docker.io/docker/demo:0.1.1 --config config.json:application/vnd.random.volume.v1+tar.gz foo.txt:text/plain

Exists	7d865e959b24 foo.txt
Pushed registry-1.docker.io/docker/demo:0.1.1
Digest: sha256:d2fb2b176ee4e326f1f34ecdaede8db742f2c444cb2c9ceff0f5c8b743281c95

You can see the new content is assigned a generic Other type. We can still view the tagged content’s media type by hovering over the type label. In this case, that’s application/vnd.random.volume.v1+tar.gz:

Other Content Type List

If you’d like to filter the repositories that contain both Helm charts and volumes, use the same drop-down menu in the top-right corner:

Volume Type Selection

Working with container images

Finally, you can continue pushing your regular container images to the exact same repository as your other artifacts. Say we re-tag the Redis Docker Official Image and push it to Docker Hub:

$ docker tag redis:3.2-alpine docker/demo:v1.2.2

$ docker push docker/demo:v1.2.2

The push refers to repository [docker.io/docker/demo]
a1892d5d1a6d: Mounted from library/redis
e41876edb6d0: Mounted from library/redis
7119119b7542: Mounted from library/redis
169a281fff0f: Mounted from library/redis
04c8ef03e935: Mounted from library/redis
df64d3292fd6: Mounted from library/redis
v1.2.2: digest: sha256:359cfebb00bef01cda3bc1ca453e6455c770a246a06ad8df499a28118c144eda size: 1570

Viewing your container images

If you now visit the demo repository page on Docker Hub, you’ll see every artifact listed under Tags and scans:

All Artifacts Content List

We’ll also introduce more features soon to help you better organize your application content, so stay tuned for more announcements!

Follow along for more updates

All developers can now access and choose from more robust sets of artifacts while building and distributing applications with Docker Hub. Not only does this remove existing roadblocks, but it’ll hopefully encourage you to create and distribute even more exciting applications.

But, our mission doesn’t end here! We’re continually working to bolster our OCI support. While the OCI Artifact Specification is considered a release candidate, full Docker Hub support for OCI Reference Types and the accompanying Referrers API is on the horizon. Stay tuned for upcoming enhancements, improved repo organization, and more.

Note: The OCI artifact has now been removed from OCI image-spec. Refer to this update for more information.

]]>
Security Advisory: High Severity OpenSSL Vulnerabilities https://www.docker.com/blog/security-advisory-critical-openssl-vulnerability/ Thu, 27 Oct 2022 22:19:42 +0000 https://www.docker.com/?p=38506 Update: 01 November 2022 12:57 PM PDT

The OpenSSL Project has officially disclosed two high-severity vulnerabilities: CVE-2022-3602 and CVE-2022-3786. These CVEs impact all OpenSSL versions after 3.0. The sole exception is version 3.0.7, which contains fixes for those latest vulnerabilities. Previously, these CVEs were thought to be “critical.”


Our title and original post below (written October 27th, 2022) have been updated:

What are they?

CVE-2022-3602 is an arbitrary 4-byte stack buffer overflow that could trigger crashes or allow remote code execution (RCE). Meanwhile, attackers can exploit CVE-2022-3786 via malicious email addresses to trigger a denial-of-service (DoS) state via buffer overflow.

The pre-announcement expected the vulnerability to be deemed “critical” per the OpenSSL Project’s security guidelines. Since then, the OpenSSL Project has downgraded those vulnerabilities to “high” severity in its updated advisory. Regardless, the Project recommends updating to OpenSSL 3.0.7 as quickly as possible.

Docker estimates about 1,000 image repositories could be impacted across various Docker Official Images and Docker Verified Publisher images. This includes images that are based on versions of Debian 12, Ubuntu 22.04, and Redhat Enterprise Linux 9+ which install 3.x versions of OpenSSL. Images using Node.js 18 and 19 are also affected.

We’re updating our users to help them quickly remediate any impacted images.

Am I vulnerable?

With OpenSSL’s vulnerability details now live, it’s time to see if your public and private repositories are impacted. Docker created a placeholder that references both OpenSSL CVEs, which we’ll link to the official CVEs. 

Like with Heartbleed, OpenSSL’s maintainers carefully considered what information they publicized until fixes arrived. You can now better protect yourself. We’ve created a way to quickly and transparently analyze any image’s security flaws.

Visit Docker’s Image Vulnerability Database, navigate to the “Vulnerability search” tab, and search for the placeholder security advisory dubbed “DSA-2022-0001”. You can also use this tool to see other vulnerabilities as they’re discovered, receive updates to refresh outdated base images, and more:

Luckily, you can take targeted steps to determine how vulnerable you are. We suggest using the docker scan CLI command and Snyk’s Docker Hub Vulnerability Scanning tool. This’ll help detect the presence of vulnerable library versions and flag your image as vulnerable.

Alternatively, Docker is providing an experimental local tool to detect OpenSSL 3.x in Docker images. You can install this tool from its GitHub repository. Then, you can search your image for OpenSSL 3.x version with the following command:

$ docker-index cve --image gradle@sha256:1a6b42a0a86c9b62ee584f209a17d55a2c0c1eea14664829b2630f28d57f430d DSA-2022–0001 -r

If the image contains a vulnerable OpenSSL version, your terminal output will resemble the following:

Docker Index CVE Output

And if Docker doesn’t detect a vulnerable version of OpenSSL in your image, you’ll see the following:

DSA-2022-0001 not detected

Update and protect yourself today

While we’re happy to see these latest CVEs have been downgraded, it’s important to take every major vulnerability very seriously. Remember to update to OpenSSL version 3.0.7 to squash these bugs and harden your applications. 

We also encourage you to sign up for our Early Access Program to access the tools discussed in this blog, plus have the opportunity to provide invaluable product feedback to help us improve!

]]>
Engineering Archives | Docker nonadult
How to Implement Decentralized Storage Using Docker Extensions https://www.docker.com/blog/how-to-implement-decentralized-storage-using-docker-extensions/ Thu, 27 Oct 2022 14:00:00 +0000 https://www.docker.com/?p=38456 This is a guest post written by Marton Elek, Principal Software Engineer at Storj.

In part one of this two-part series, we discussed the intersection of Web3 and Docker at a conceptual level. In this post, it’s time to get our hands dirty and review practical examples involving decentralized storage.

We’d like to see how we can integrate Web3 projects with Docker. At the beginning we have to choose from two options:

  1. We can use Docker to containerize any Web3 application. We can also start an IPFS daemon or an Ethereum node inside a container. Docker resembles an infrastructure layer since we can run almost anything within containers.
  2. What’s most interesting is integrating Docker itself with Web3 projects. That includes using Web3 to help us when we start containers or run something inside containers. In this post, we’ll focus on this portion.

The two most obvious integration points for a container engine are execution and storage. We choose storage here since more mature decentralized storage options are currently available. There are a few interesting approaches for decentralized versions of cloud container runtimes (like ankr), but they’re more likely replacements for container orchestrators like Kubernetes — not the container engine itself.

Let’s use Docker with decentralized storage. Our example uses Storj, but all of our examples apply to almost any decentralized cloud storage solution.

Storj Components

Storj is a decentralized cloud storage where node providers are compensated to host the data, but metadata servers (which manage the location of the encrypted pieces) are federated (many, interoperable central servers can work together with storage providers).

It’s important to mention that decentralized storage almost always requires you to use a custom protocol. A traditional HTTP upload is a connection between one client and one server. Decentralization requires uploading data to multiple servers. 

Our goal is simple: we’d like to use docker push and docker pull commands with decentralized storage instead of a central Docker registry. In our latest DockerCon presentation, we identified multiple approaches:

  • We can change Docker and containerd to natively support different storage options
  • We can provide tools that magically download images from decentralized storage and persists them in the container engine’s storage location (in the right format, of course)
  • We can run a service which translates familiar Docker registry HTTP requests to a protocol specific to the decentralized cloud
    • Users can manage this themselves.
    • This can also be a managed service.

Leveraging native support

I believe the ideal solution would be to extend Docker (and/or the underlying containerd runtime) to support different storage options. But this is definitely a bigger challenge. Technically, it’s possible to modify every service, but massive adoption and a big user base mean that large changes require careful planning.

Currently, it’s not readily possible to extend the Docker daemon to use special push or pull targets. Check out our presentation on extending Docker if you’re interested in technical deep dives and integration challenges. The best solution might be a new container plugin type, which is being considered.

One benefit of this approach would be good usability. Users can leverage common push or pull commands. But based on the host, the container layers can be sent to a decentralized storage.

Using tool-based push and pull

Another option is to upload or download images with an external tool — which can directly use remote decentralized storage and save it to the container engine’s storage directory.

One example of this approach (but with centralized storage) is the AWS ECR container resolver project. It provides a CLI tool which can pull and push images using a custom source. It also saves them as container images of the containerd daemon.

Unfortunately this approach also have some strong limitations:

  • It couldn’t work with a container orchestrator like Kubernetes, since they aren’t prepared to run custom CLI commands outside of pulling or pushing images.
  • It’s containerd specific. The Docker daemon – with different storage – couldn’t use it directly.
  • The usability is reduced since users need different CLI tools.

Using a user-manager gateway

If we can’t push or pull directly to decentralized storage, we can create a service which resembles a Docker registry and meshes with any client.ut under the hood, it uploads the data using the decentralized storage’s native protocol.

This thankfully works well, and the standard Docker registry implementation is already compatible with different storage options. 

At Storj, we already have an implementation that we use internally for test images. However, the nerdctl ipfs subcommand is another good example for this approach (it starts a local registry to access containers from IPFS).

We have problems here as well:

  • Users should run the gateway on each host. This can be painful alongside Kubernetes or other orchestrators.
  • Implementation can be more complex and challenging compared to a native upload or download.

Using a hosted gateway

To make it slightly easier one can provide a hosted version of the gateway. For example, Storj is fully S3 compatible via a hosted (or self-hosted) S3 compatible HTTP gateway. With this approach, users have three options:

  • Use the native protocol of the decentralized storage with full end-to-end encryption and every feature
  • Use the convenient gateway services and trust the operator of the hosted gateways.
  • Run the gateway on its own

While each option is acceptable, a perfect solution still doesn’t exist.

Using Docker Extensions

One of the biggest concerns with using local gateways was usability. Our local registry can help push images to decentralized storage, but it requires additional technical work (configuring and running containers, etc.)

This is where Docker Extensions can help us. Extensions are a new feature of Docker Desktop. You can install them via the Docker Dashboard, and they can provide additional functionality — including new screens, menu items, and options within Docker Desktop. These are discoverable within the Extensions Marketplace:

Extensions Marketplace

And this is exactly what we need! A good UI can make Web3 integration more accessible for all users.

Docker Extensions are easily discoverable within the Marketplace, and you can also add them manually (usually for the development).

At Storj, we started experimenting with better user experiences by developing an extension for Docker Desktop. It’s still under development and not currently in the Marketplace, but feedback so far has convinced us that it can massively improve usability, which was our biggest concern with almost every available integration option.

Extensions themselves are Docker containers, which make the development experience very smooth and easy. Extensions can be as simple as a metadata file in a container and static HTML/JS files. There are special JavaScript APIs that manipulate the Docker daemon state without a backend.

You can also use a specialized backend. The JavaScript part of the extension can communicate with any containerized backend via a mounted socket.

The new docker extension command can help you quickly manage extensions (as an example: there’s a special docker extension dev debug subcommand that shows the Web Developer Toolbar for Docker Desktop itself.)

Storj Docker Registry Extension

Thanks to the provided developer tools, the challenge is not creating the Docker Desktop extension, but balancing the UI and UX.

Summary

As we discussed in our previous post, Web3 should be defined by user requirements, not by technologies (like blockchain or NFT). Web3 projects should address user concerns around privacy, data control, security, and so on. They should also be approachable and easy to use.

Usability is a core principle of containers, and one reason why Docker became so popular. We need more integration and extension points to make it easier for Web3 project users to provide what they need. Docker Extensions also provide a very powerful way to pair good integration with excellent usability.

We welcome you to try our Storj Extension for Docker (still under development). Please leave any comments and feedback via GitHub.

]]>