Docker images – Docker https://www.docker.com Tue, 14 May 2024 18:29:53 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.3 https://www.docker.com/wp-content/uploads/2024/02/cropped-docker-logo-favicon-32x32.png Docker images – Docker https://www.docker.com 32 32 Automating Docker Image Builds with Pulumi and Docker Build Cloud https://www.docker.com/blog/pulumi-and-docker-build-cloud/ Tue, 14 May 2024 14:50:18 +0000 https://www.docker.com/?p=54859 This guest post was contributed by Diana Esteves, Solutions Architect, Pulumi.

Pulumi is an Infrastructure as Code (IaC) platform that simplifies resource management across any cloud or SaaS provider, including Docker. Pulumi providers are integrations with useful tools and vendors. Pulumi’s new Docker Build provider is about making your builds even easier, faster, and more reliable. 

In this post, we will dive into how Pulumi’s new Docker Build provider works with Docker Build Cloud to streamline building, deploying, and managing containerized applications. First, we’ll set up a project using Docker Build Cloud and Pulumi. Then, we’ll explore cool use cases that showcase how you can leverage this provider to simplify your build and deployment pipelines.  

2400x1260 docker pulumi

Pulumi Docker Build provider features

Top features of the Pulumi Docker Build provider include the following:

  • Docker Build Cloud support: Offload your builds to the cloud and free up your local resources. Faster builds mean fewer headaches.
  • Multi-platform support: Build Docker images that work on different hardware architectures without breaking a sweat.
  • Advanced caching: Say goodbye to redundant builds. In addition to the shared caching available when you use Docker Build Cloud, this provider supports multiple cache backends, like Amazon S3, GitHub Actions, and even local disk, to keep your builds efficient.
  • Flexible export options: Customize where your Docker images go after they’re built — export to registries, filesystems, or wherever your workflow needs.

Getting started with Docker Build Cloud and Pulumi  

Docker Build Cloud is Docker’s newest offering that provides a pair of AMD and Arm builders in the cloud and shared cache for your team, resulting in up to 39x faster image builds. Docker Personal, Pro, Team, and Business plans include a set number of Build Cloud minutes, or you can purchase a Build Cloud Team plan to add minutes. Learn more about Docker Build Cloud plans

The example builds an NGINX Dockerfile using a Docker Build Cloud builder. We will create a Docker Build Cloud builder, create a Pulumi program in Typescript, and build our image.

Prerequisites

Step 1: Set up your Docker Build Cloud builder

Building images locally means being subject to local compute and storage availability. Pulumi allows users to build images with Docker Build Cloud.

The Pulumi Docker Build provider fully supports Docker Build Cloud, which unlocks new capabilities, as individual team members or a CI/CD pipeline can fully take advantage of improved build speeds, shared build cache, and native multi-platform builds.

If you still need to create a builder, follow the steps below; otherwise, skip to step 1C.

A. Log in to your Docker Build Cloud account.

B. Create a new cloud builder named my-cool-builder. 

pulumi docker build f1
Figure 1: Create the new cloud builder and call it my-cool-builder.

C. In your local machine, sign in to your Docker account.

$ docker login

D. Add your existing cloud builder endpoint.

$ docker buildx create --driver cloud ORG/BUILDER_NAME
# Replace ORG with the Docker Hub namespace of your Docker organization. 
# This creates a builder named cloud-ORG-BUILDER_NAME.

# Example:
$ docker buildx create --driver cloud pulumi/my-cool-builder
# cloud-pulumi-my-cool-builder

# check your new builder is configured
$ docker buildx ls

E. Optionally, see that your new builder is available in Docker Desktop.

pulumi docker build f2
Figure 2: The Builders view in the Docker Desktop settings lists all available local and Docker Build Cloud builders available to the logged-in account.

For additional guidance on setting up Docker Build Cloud, refer to the Docker docs.

Step 2: Set up your Pulumi project

To create your first Pulumi project, start with a Pulumi template. Pulumi has curated hundreds of templates that are directly integrated with the Pulumi CLI via pulumi new. In particular, the Pulumi team has created a Pulumi template for Docker Build Cloud to get you started.

The Pulumi programming model centers around defining infrastructure using popular programming languages. This approach allows you to leverage existing programming tools and define cloud resources using familiar syntaxes such as loops and conditionals.

To copy the Pulumi template locally:

$ pulumi new https://github.com/pulumi/examples/tree/master/dockerbuildcloud-ts --dir hello-dbc
# project name: hello-dbc 
# project description: (default)
# stack name: dev
# Note: Update the builder value to match yours
# builder: cloud-pulumi-my-cool-builder 
$ cd hello-dbc

# update all npm packages (recommended)
$ npm update --save

Optionally, explore your Pulumi program. The hello-dbc folder has everything you need to build a Dockerfile into an image with Pulumi. Your Pulumi program starts with an entry point, typically a function written in your chosen programming language. This function defines the infrastructure resources and configurations for your project. For TypeScript, that file is index.ts, and the contents are shown below:

import * as dockerBuild from "@pulumi/docker-build";
import * as pulumi from "@pulumi/pulumi";

const config = new pulumi.Config();
const builder = config.require("builder");

const image = new dockerBuild.Image("image", {

   // Configures the name of your existing buildx builder to use.
   // See the Pulumi.<stack>.yaml project file for the builder configuration.
   builder: {
       name: builder, // Example, "cloud-pulumi-my-cool-builder",
   },
   context: {
       location: "app",
   },
   // Enable exec to run a custom docker-buildx binary with support
   // for Docker Build Cloud (DBC).
   exec: true,
   push: false,
});

Step 3: Build your Docker image

Run the pulumi up command to see the image being built with the newly configured builder:

$ pulumi up --yes

You can follow the browser link to the Pulumi Cloud dashboard and navigate to the Image resource to confirm it’s properly configured by noting the builder parameter.

pulumi docker build f3
Figure 3: Navigate to the Image resource to check the configuration.

Optionally, also check your Docker Build Cloud dashboard for build minutes usage:

pulumi docker build f4
Figure 4: The build.docker.com view shows the user has selected the Cloud builders from the left menu and the product dashboard is shown on the right side.

Congratulations! You have built an NGINX Dockerfile with Docker Build Cloud and Pulumi. This was achieved by creating a new Docker Build Cloud builder and passing that to a Pulumi template. The Pulumi CLI is then used to deploy the changes.

Advanced use cases with buildx and BuildKit

To showcase popular buildx and BuildKit features, test one or more of the following Pulumi code samples. These include multi-platform, advanced caching,  and exports. Note that each feature is available as an input (or parameter) in the Pulumi Docker Build Image resource. 

Multi-platform image builds for Docker Build Cloud

Docker images can support multiple platforms, meaning a single image may contain variants for architectures and operating systems. 

The following code snippet is analogous to invoking a build from the Docker CLI with the --platform flag to specify the target platform for the build output.

import * as dockerBuild from "@pulumi/docker-build";

const image = new dockerBuild.Image("image", {
   // Build a multi-platform image manifest for ARM and AMD.
   platforms: [
       dockerBuild.Platform.Linux_amd64,
       dockerBuild.Platform.Linux_arm64,
   ],
   push: false,

});

Deploy the changes made to the Pulumi program:

$ pulumi up --yes

Caching from and to AWS ECR

Maintaining cached layers while building Docker images saves precious time by enabling faster builds. However, utilizing cached layers has been historically challenging in CI/CD pipelines due to recycled environments between builds. The cacheFrom and cacheTo parameters allow programmatic builds to optimize caching behavior. 

Update your Docker image resource to take advantage of caching:

import * as dockerBuild from "@pulumi/docker-build";
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws"; // Required for ECR

// Create an ECR repository for pushing.
const ecrRepository = new aws.ecr.Repository("ecr-repository", {});

// Grab auth credentials for ECR.
const authToken = aws.ecr.getAuthorizationTokenOutput({
   registryId: ecrRepository.registryId,
});

const image = new dockerBuild.Image("image", {
   push: true,
   // Use the pushed image as a cache source.
   cacheFrom: [{
       registry: {
           ref: pulumi.interpolate`${ecrRepository.repositoryUrl}:cache`,
       },
   }],
   cacheTo: [{
       registry: {
           imageManifest: true,
           ociMediaTypes: true,
           ref: pulumi.interpolate`${ecrRepository.repositoryUrl}:cache`,
       },
   }],
   // Provide our ECR credentials.
   registries: [{
       address: ecrRepository.repositoryUrl,
       password: authToken.password,
       username: authToken.userName,
   }],
})

Notice the declaration of additional resources for AWS ECR. 

Export builds as a tar file

Exporting allows us to share or back up the resulting image from a build invocation. 

To export the build as a local .tar file, modify your resource to include the exports Input:

const image = new dockerBuild.Image("image", {
   push: false,
   exports: [{
       docker: {
           tar: true,
       },
   }],
})

Deploy the changes made to the Pulumi program:

$ pulumi up --yes

Review the Pulumi Docker Build provider guide to explore other Docker Build features, such as build arguments, build contexts, remote contexts, and more.

Next steps

Infrastructure as Code (IaC) is key to managing modern cloud-native development, and Docker lets developers create and control images with Dockerfiles and Docker Compose files. But when the situation gets more complex, like deploying across different cloud platforms, Pulumi can offer additional flexibility and advanced infrastructure features. The Docker Build provider supports Docker Build Cloud, streamlining building, deploying, and managing containerized applications, which helps development teams work together more effectively and maintain agility.

Pulumi’s latest Docker Build provider, powered by BuildKit, improves flexibility and efficiency in Docker builds. By applying IaC principles, developers manage infrastructure with code, even in intricate scenarios. This means you can focus on building and deploying your containerized workloads without the hassle of complex infrastructure challenges.

Visit Pulumi’s launch announcement and the provider documentation to get started with the Docker Build provider. 

Register for the June 25 Pulumi and Docker virtual workshop: Automating Docker Image Builds using Pulumi and Docker.

Learn more

]]>
How IKEA Retail Standardizes Docker Images for Efficient Machine Learning Model Deployment https://www.docker.com/blog/how-ikea-retail-standardizes-docker-images-for-efficient-machine-learning-model-deployment/ Wed, 20 Sep 2023 15:13:19 +0000 https://www.docker.com/?p=46056 What do Docker and IKEA Retail have in common? Both companies have changed how products are built, stored, and shipped. In IKEA Retail’s case, they created the market of flat-packed furniture items, which made everything from shipping, warehousing, and delivering their furniture to the end location much easier and more cost effective. This parallels what Docker has done for developers. Docker has changed the way that software is built, shipped, and stored, with Docker Images taking up much less space “shelf” space. 

In this post, contributing authors Karan Honavar and Fernando Dorado Rueda from IKEA Retail walk through their MLOps solution, built with Docker.

v2 banner unlocking the power of production standardizing docker images for efficient ml model deployment

Machine learning (ML) deployment, the act of shifting an ML model from the developmental stage to a live production environment, is paramount to translating complex algorithms into real-world solutions. Yet, this intricate process isn’t without its challenges, including:

  1. Complexity and opacity: With ML models often veiled in complexity, deciphering their logic can be taxing. This obscurity impedes trust and complicates the explanation of decisions to stakeholders.
  2. Adaptation to changing data patterns: The shifting landscape of real-world data can deviate from training sets, causing “concept drift.” Addressing this requires vigilant retraining, an arduous task that wastes time and resources.
  3. Real-time data processing: Handling the deluge of data necessary for accurate predictions can burden systems and impede scalability.
  4. Varied deployment methods: Whether deployed locally, in the cloud, or via web services, each method brings unique challenges, adding layers of complexity to an already intricate procedure.
  5. Security and compliance: Ensuring that ML models align with rigorous regulations, particularly around private information, necessitates a focus on lawful implementation.
  6. Ongoing maintenance and monitoring: The journey doesn’t end with deployment. Constant monitoring is vital to sustain the model’s health and address emerging concerns.

These factors represent substantial obstacles, but they are not insurmountable. We can streamline the journey from the laboratory to the real world by standardizing Docker images for efficient ML model deployment. 

This article will delve into the creation, measurement, deployment, and interaction with Dockerized ML models. We will demystify the complexities and demonstrate how Docker can catalyze cutting-edge concepts into tangible benefits.

Standardization deployment process via Docker

In the dynamic realm of today’s data-driven enterprises, such as our case at IKEA Retail, the multitude of tools and deployment strategies serves both as a boon and a burden. Innovation thrives, but so too does complexity, giving rise to inconsistency and delays. The antidote? Standardization. It’s more than just a buzzword; it’s a method to pave the way to efficiency, compliance, and seamless integration.

Enter Docker, the unsung hero in this narrative. In the evolving field of ML deployment, Docker offers agility and uniformity. It has reshaped the landscape by offering a consistent environment from development to production. The beauty of Docker lies in its containerization technology, enabling developers to wrap up an application with all the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

At IKEA Retail, diverse teams — including hybrid data scientist teams and R&D units — conceptualize and develop models, each selecting drivers and packaging libraries according to their preferences and requirements. Although virtual environments provide a certain level of support, they can also present compatibility challenges when transitioning to a production environment. 

This is where Docker becomes an essential tool in our daily operations, offering simplification and a marked acceleration in the development and deployment process. Here are key advantages:

  • Portability: With Docker, the friction between different computing environments melts away. A container runs uniformly, regardless of where it’s deployed, bringing robustness to the entire pipeline.
  • Efficiency: Docker’s lightweight nature ensures that resources are optimally utilized, thereby reducing overheads and boosting performance.
  • Scalability: With Docker, scaling your application or ML models horizontally becomes a breeze. It aligns perfectly with the orchestrated symphony that large-scale deployment demands.

Then, there’s Seldon-Core, a solution chosen by IKEA Retail’s forward-thinking MLOps (machine learning operations) team. Why? Because it transforms ML models into production-ready microservices, regardless of the model’s origin (TensorFlow, PyTorch, H2O, etc.) or language (Python, Java, etc.). But that’s not all. Seldon-Core scales precisely, enabling everything from advanced metrics and logging to explainers and A/B testing.

This combination of Docker and Seldon-Core forms the heart of our exploration today. Together, they sketch the blueprint for a revolution in ML deployment. This synergy is no mere technical alliance; it’s a transformative collaboration that redefines deploying, monitoring, and interacting with ML models.

Through the looking glass of IKEA Retail’s experience, we’ll unearth how this robust duo — Docker and Seldon-Core — can turn a convoluted task into a streamlined, agile operation and how you can harness real-time metrics for profound insights.

Dive into this new MLOps era with us. Unlock efficiency, scalability, and a strategic advantage in ML production. Your innovation journey begins here, with Docker and Seldon-Core leading the way. This is more than a solution; it’s a paradigm shift.

In the rest of this article, we will cover deployment steps, including model preparation, encapsulating the model into an Docker image, and testing. Let’s get started.

Prerequisites

The following items must be present to replicate this example:

  • Docker: Ensure Docker is up and running, easily achievable through solutions like Docker Desktop
  • Python: Have a local installation at the ready (+3.7)

Model preparation

Model training and simple evaluation

Embarking on the journey to deploying an ML model is much like crafting a masterpiece: The canvas must be prepared, and every brushstroke must be deliberate. However, the focus of this exploration isn’t the art itself but rather the frame that holds it — the standardization of ML models, regardless of their creation or the frameworks used.

The primary objective of this demonstration is not to augment the model’s performance but rather to elucidate the seamless transition from local development to production deployment. It is imperative to note that the methodology we present is universally applicable across different models and frameworks. Therefore, we have chosen a straightforward model as a representative example. This choice is intentional, allowing readers to concentrate on the underlying process flows, which can be readily adapted to more sophisticated models that may require refined hyperparameter tuning and meticulous model selection. 

By focusing on these foundational principles, we aim to provide a versatile and accessible guide that transcends the specificities of individual models or use cases. Let’s delve into this process.

To align with our ethos of transparency and consumer privacy and to facilitate your engagement with this approach, a public dataset is employed for a binary classification task.

In the following code excerpt, you’ll find the essence of our training approach, reflecting how we transform raw data into a model ready for real-world challenges:

import os
import pickle
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression, Perceptron
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the breast cancer dataset
X, y = datasets.load_breast_cancer(return_X_y=True)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.9, random_state=0)

# Combine X_test and y_test into a single DataFrame
X_test_df = pd.DataFrame(X_test, columns=[f"feature_{i}" for i in range(X_test.shape[1])])
y_test_df = pd.DataFrame(y_test, columns=["target"])

df_test = pd.concat([X_test_df, y_test_df], axis=1)

# Define the path to store models
model_path = "models/"

# Create the folder if it doesn't exist
if not os.path.exists(model_path):
    os.makedirs(model_path)

# Define a list of classifier parameters
parameters = [
    {"clf": LogisticRegression(solver="liblinear", multi_class="ovr"), "name": f"{model_path}/binary-lr.joblib"},
    {"clf": Perceptron(eta0=0.1, random_state=0), "name": f"{model_path}/binary-percept.joblib"},
]

# Iterate through each parameter configuration
for param in parameters:
    clf = param["clf"]  # Retrieve the classifier from the parameter dictionary
    clf.fit(X_train, y_train)  # Fit the classifier on the training data
    # Save the trained model to a file using pickle
    model_filename = f"{param['name']}"
    with open(model_filename, 'wb') as model_file:
        pickle.dump(clf, model_file)
    print(f"Model saved in {model_filename}")

# Simple Model Evaluation
model_path = 'models/binary-lr.joblib'
with open(model_path, 'rb') as model_file:
    loaded_model = pickle.load(model_file)

# Make predictions using the loaded model
predictions = loaded_model.predict(X_test)

# Calculate metrics (accuracy, precision, recall, f1-score)
accuracy = accuracy_score(y_test, predictions)
precision = precision_score(y_test, predictions)
recall = recall_score(y_test, predictions)
f1 = f1_score(y_test, predictions)

Model class creation

With the model files primed, the task at hand shifts to the crafting of the model class — an essential architectural element that will later reside within the Docker image. Like a skilled sculptor, we must shape this class, adhering to the exacting standards proposed by Seldon:

import joblib
import logging

class Score:
    """
    Class to hold metrics for binary classification, including true positives (TP), false positives (FP),
    true negatives (TN), and false negatives (FN).
    """
    def __init__(self, TP=0, FP=0, TN=0, FN=0):
        self.TP = TP  # True Positives
        self.FP = FP  # False Positives
        self.TN = TN  # True Negatives
        self.FN = FN  # False Negatives


class DockerModel:
    """
    Class for loading and predicting using a pre-trained model, handling feedback to update metrics,
    and providing those metrics.
    """
    result = {}  # Dictionary to store input data

    def __init__(self, model_name="models/binary-lr.joblib"):
        """
        Initialize DockerModel with metrics and model name.
        :param model_name: Path to the pre-trained model.
        """
        self.scores = Score(0, 0, 0, 0)
        self.loaded = False
        self.model_name = model_name

    def load(self):
        """
        Load the model from the provided path.
        """
        self.model = joblib.load(self.model_name)
        logging.info(f"Model {self.model_name} Loaded")

    def predict(self, X, features_names=None, meta=None):
        """
        Predict the target using the loaded model.
        :param X: Features for prediction.
        :param features_names: Names of the features, optional.
        :param meta: Additional metadata, optional.
        :return: Predicted target values.
        """
        self.result['shape_input_data'] = str(X.shape)
        logging.info(f"Received request: {X}")
        if not self.loaded:
            self.load()
            self.loaded = True
        predictions = self.model.predict(X)
        return predictions

    def send_feedback(self, features, feature_names, reward, truth, routing=""):
        """
        Provide feedback on predictions and update the metrics.
        :param features: Features used for prediction.
        :param feature_names: Names of the features.
        :param reward: Reward signal, not used in this context.
        :param truth: Ground truth target values.
        :param routing: Routing information, optional.
        :return: Empty list as return value is not used.
        """
        predicted = self.predict(features)
        logging.info(f"Predicted: {predicted[0]}, Truth: {truth[0]}")
        if int(truth[0]) == 1:
            if int(predicted[0]) == int(truth[0]):
                self.scores.TP += 1
            else:
                self.scores.FN += 1
        else:
            if int(predicted[0]) == int(truth[0]):
                self.scores.TN += 1
            else:
                self.scores.FP += 1
        return []  # Ignore return statement as its not used

    def calculate_metrics(self):
        """
        Calculate the accuracy, precision, recall, and F1-score.
        :return: accuracy, precision, recall, f1_score
        """
        total_samples = self.scores.TP + self.scores.TN + self.scores.FP + self.scores.FN

        # Check if there are any samples to avoid division by zero
        if total_samples == 0:
            logging.warning("No samples available to calculate metrics.")
            return 0, 0, 0, 0  # Return zeros for all metrics if no samples

        accuracy = (self.scores.TP + self.scores.TN) / total_samples

        # Check if there are any positive predictions to calculate precision
        positive_predictions = self.scores.TP + self.scores.FP
        precision = self.scores.TP / positive_predictions if positive_predictions != 0 else 0

        # Check if there are any actual positives to calculate recall
        actual_positives = self.scores.TP + self.scores.FN
        recall = self.scores.TP / actual_positives if actual_positives != 0 else 0

        # Check if precision and recall are non-zero to calculate F1-score
        if precision + recall == 0:
            f1_score = 0
        else:
            f1_score = 2 * (precision * recall) / (precision + recall)

        # Return the calculated metrics
        return accuracy, precision, recall, f1_score


    def metrics(self):
        """
        Generate metrics for monitoring.
        :return: List of dictionaries containing accuracy, precision, recall, and f1_score.
        """
        accuracy, precision, recall, f1_score = self.calculate_metrics()
        return [
            {"type": "GAUGE", "key": "accuracy", "value": accuracy},
            {"type": "GAUGE", "key": "precision", "value": precision},
            {"type": "GAUGE", "key": "recall", "value": recall},
            {"type": "GAUGE", "key": "f1_score", "value": f1_score},
        ]
        
    def tags(self):
        """
        Retrieve metadata when generating predictions
        :return: Dictionary the intermediate information
        """
        return self.result

Let’s delve into the details of the functions and classes within the DockerModel class that encapsulates these four essential aspects:

  1. Loading and predicting:
    • load(): This function is responsible for importing the pretrained model from the provided path. It’s usually called internally before making predictions to ensure the model is available.
    • predict(X, features_names=None, meta=None): This function deploys the loaded model to make predictions. It takes in the input features X, optional features_names, and optional metadata meta, returning the predicted target values.
  2. Feedback handling:
    • send_feedback(features, feature_names, reward, truth, routing=""): This function is vital in adapting the model to real-world feedback. It accepts the input data, truth values, and other parameters to assess the model’s performance. The feedback updates the model’s understanding, and the metrics are calculated and stored for real-time analysis. This facilitates continuous retraining of the model.
  3. Metrics calculation:
    • calculate_metrics(): This function calculates the essential metrics of accuracy, precision, recall, and F1-score. These metrics provide quantitative insights into the model’s performance, enabling constant monitoring and potential improvement.
    • Score class: This auxiliary class is used within the DockerModel to hold metrics for binary classification, including true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). It helps keep track of these parameters, which are vital for calculating the aforementioned metrics.
  4. Monitoring assistance:
    • metrics(): This function generates the metrics for model monitoring. It returns a list of dictionaries containing the calculated accuracy, precision, recall, and F1 score. These metrics are compliant with Prometheus Metrics, facilitating real-time monitoring and analysis.
    • tags(): This function is designed to retrieve custom metadata data when generating predictions, aiding in monitoring and debugging. It returns a dictionary, which can help track and understand the nature of the requests.

Together, these functions and classes form a cohesive and robust structure that supports the entire lifecycle of an ML model. From the moment of inception (loading and predicting) through its growth (feedback handling) and evaluation (metrics calculation), to its continuous vigilance (monitoring assistance), the architecture is designed to standardize and streamline the process of deploying and maintaining ML models in a real-world production environment.

This model class is more than code; it’s the vessel that carries our ML model from a local environment to the vast sea of production. It’s the vehicle for standardization, unlocking efficiency and consistency in deploying models.

At this stage, we’ve prepared the canvas and outlined the masterpiece. Now, it’s time to dive deeper and explore how this model is encapsulated into a Docker image, an adventure that blends technology and strategy to redefine ML deployment. 

Testing model locally

Before venturing into creating a Docker image, testing the model locally is vital. This step acts as a rehearsal before the main event, providing a chance to ensure that the model is performing as expected with the testing data.

The importance of local testing lies in its ability to catch issues early, avoiding potential complications later in the deployment process. Following the example code provided below, it can confirm that the model is ready for its next phase if it provides the expected prediction in the expected format:

from DockerModel import DockerModel 
demoModel = DockerModel() 
demoModel.predict(X_test) # Can take the entire testing dataset or individual predictions

The expected output should match the format of the class labels you anticipate from the model. If everything works correctly, you’re assured that the model is well prepared for the next grand step: encapsulation within a Docker image.

Local testing is more than a technical process; it’s a quality assurance measure that stands as a gatekeeper, ensuring that only a well-prepared model moves forward. It illustrates the meticulous care taken in the deployment process, reflecting a commitment to excellence that transcends code and resonates with the core values of standardization and efficiency.

With the local testing accomplished, we stand on the threshold of a new frontier: creating the Docker image. Let’s continue this exciting journey, knowing each step is a stride toward innovation and mastery in ML deployment.

Encapsulating the model into a Docker image

In our IKEA Retail MLOps view, a model is not simply a collection of code. Rather, it is a sophisticated assembly comprising code, dependencies, and ML artifacts, all encapsulated within a versioned and registered Docker image. This composition is carefully designed, reflecting the meticulous planning of the physical infrastructure.

What is Docker’s role in MLOps?

Docker plays a vital role in MLOps, providing a standardized environment that streamlines the transition from development to production:

  • Streamlining deployment: Docker containers encapsulate everything an ML model needs to run, easing the deployment process.
  • Facilitating collaboration: Using Docker, data scientists and engineers can ensure that models and their dependencies remain consistent across different stages of development.
  • Enhancing model reproducibility: Docker provides a uniform environment that enhances the reproducibility of models, a critical aspect in machine learning.
  • Integrating with orchestration tools: Docker can be used with orchestration platforms like Kubernetes, enabling automated deployment, scaling, and management of containerized applications.

Docker and containerization are more than technology tools; they catalyze innovation and efficiency in MLOps. Ensuring consistency, scalability and agility, Docker unlocks new potential and opens the way for a more agile and robust ML deployment process. Whether you are a developer, a data scientist, or an IT professional, understanding Docker is critical to navigating the complex and multifaceted landscape of modern data-driven applications.

Dockerfile creation

Creating a Dockerfile is like sketching the architectural plan of a building. It outlines the instructions for creating a Docker image to run the application in a coherent, isolated environment. This design ensures that the entire model — including its code, dependencies, and unique ML artifacts — is treated as a cohesive entity, aligning with the overarching vision of IKEA Retail’s MLOps approach.

In our case, we have created a Dockerfile with the express purpose of encapsulating not only the code but all the corresponding artifacts of the model. This deliberate design facilitates a smooth transition to production, effectively bridging the gap between development and deployment.

We used the following Dockerfile for this demonstration, which represents a tangible example of how IKEA Retail’s MLOps approach is achieved through thoughtful engineering and strategic implementation.

# Use an official Python runtime as a parent image.
# Using a slim image for a smaller final size and reduced attack surface.
FROM python:3.9-slim

# Set the maintainer label for metadata.
LABEL maintainer="fernandodorado.rueda@ingka.com"

# Set environment variables for a consistent build behavior.
# Disabling the buffer helps to log messages synchronously.
ENV PYTHONUNBUFFERED=1

# Set a working directory inside the container to store all our project files.
WORKDIR /app

# First, copy the requirements file to leverage Docker's cache for dependencies.
# By doing this first, changes to the code will not invalidate the cached dependencies.
COPY requirements.txt requirements.txt

# Install the required packages listed in the requirements file.
# It's a good practice to include the --no-cache-dir flag to prevent the caching of dependencies
# that aren't necessary for executing the application.
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the code and model files into the image.
COPY DockerModel.py DockerModel.py
COPY models/    models/

# Expose ports that the application will run on.
# Port 5000 for GRPC
# Port 9000 for REST
EXPOSE 5000 9000

# Set environment variables used by the application.
ENV MODEL_NAME DockerModel
ENV SERVICE_TYPE MODEL

# Change the owner of the directory to user 8888 for security purposes.
# It can prevent unauthorised write access by the application itself.
# Make sure to run the application as this non-root user later if applicable.
RUN chown -R 8888 /app

# Use the exec form of CMD so that the application you run will receive UNIX signals.
# This is helpful for graceful shutdown.
# Here we're using seldon-core-microservice to serve the model.
CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE

This Dockerfile contains different parts:

  • FROM python:3.9-slim: This line chooses the official Python 3.9 slim image as the parent image. It is favored for its reduced size and attack surface, enhancing both efficiency and security.
  • LABEL maintainer="fernandodorado.rueda@ingka.com": A metadata label that specifies the maintainer of the image, providing contact information.
  • ENV PYTHONUNBUFFERED=1: Disabling Python’s output buffering ensures that log messages are emitted synchronously, aiding in debugging and log analysis.
  • WORKDIR /app: Sets the working directory inside the container to /app, a centralized location for all project files.
  • COPY requirements.txt requirements.txt: Copies the requirements file into the image. Doing this before copying the rest of the code leverages Docker’s caching mechanism, making future builds faster. This file must contain the “seldon-core” package:
pandas==1.3.5
requests==2.28.1
numpy==1.20
seldon-core==1.14.1
scikit-learn==1.0.2
  • RUN pip install --no-cache-dir -r requirements.txt: Installs required packages as listed in the requirements file. The flag -no-cache-dir prevents unnecessary caching of dependencies, reducing the image size.
  • COPY DockerModel.py DockerModel.py: Copies the main Python file into the image.
  • COPY models/ models/: Copies the model files into the image.
  • EXPOSE 5000 9000: Exposes ports 5000 (GRPC) and 9000 (REST), allowing communication with the application inside the container.
  • ENV MODEL_NAME DockerModel: Sets the environment variable for the model name.
  • ENV SERVICE_TYPE MODEL: Sets the environment variable for the service type.
  • RUN chown -R 8888 /app: Changes the owner of the directory to user 8888. Running the application as a non-root user helps mitigate the risk of unauthorized write access.
  • CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE: Executes the command to start the service using seldon-core-microservice. It also includes the model name and service type as parameters. Using exec ensures the application receives UNIX signals, facilitating graceful shutdown.

Building and pushing Docker image

1. Installing Docker Desktop

If not already installed, Docker Desktop is recommended for this task. Docker Desktop provides a graphical user interface that simplifies the process of building, running, and managing Docker containers. Docker Desktop also supports Kubernetes, offering an easy way to create a local cluster.

2. Navigating to the Project directory

Open a terminal or command prompt.

Navigate to the folder where the Dockerfile and other necessary files are located.

3. Building the Image

Execute the command: docker build . -t docker-model:1.0.0.

  • docker build . instructs Docker to build the image using the current directory (.).
  • -t docker-model:1.0.0 assigns a name (docker-model) and tag (1.0.0) to the image.

The build process will follow the instructions defined in the Dockerfile, creating a Docker image encapsulating the entire environment needed to run the model.

4. Pushing the image

If needed, the image can be pushed to a container registry like Docker Hub, or a private registry within an organization.

For this demonstration, the image is being kept in the local container registry, simplifying the process and removing the need for authentication with an external registry.

Deploy ML model using Docker: Unleash it into the world

Once the Docker image is built, running it is relatively straightforward. Let’s break down this process:

docker run --rm --name docker-model -p 9000:9000 docker-model:1.0.0

Components of the command:

  1. docker run: This is the base command to run a Docker container.
  2. -rm: This flag ensures that the Docker container is automatically removed once it’s stopped. It helps keep the environment clean, especially when you run containers for testing or short-lived tasks.
  3. -name docker-model: Assigns a name to the running container.
  4. p 9000:9000: This maps port 9000 on the host machine to port 9000 on the Docker container. The format is p <host_port>:<container_port>. Because the Dockerfile mentions that the application will be exposing ports 5000 for GRPC and 9000 for REST, this command makes sure the REST endpoint is available to external users or applications through port 9000 on the host.
  5. docker-model:1.0.0: This specifies the name and tag of the Docker image to run. docker-model is the name, and 1.0.0 is the version tag we assigned during the build process.

What happens next

On executing the command, Docker will initiate a container instance from the docker-model:1.0.0 image.

The application within the Docker container will start and begin listening for requests on port 9000 (as specified).

With the port mapping, any incoming requests on port 9000 of the host machine will be forwarded to port 9000 of the Docker container.

The application can now be accessed and interacted with as if it were running natively on the host machine.

Test deployed model using Docker

With the Docker image in place, it’s time to see the model in action.

Generate predictions

The path from model to prediction is a delicate process, requiring an understanding of the specific input-output type that Seldon accommodates (e.g., ndarray, JSON data, STRDATA).

In our scenario, the model anticipates an array, and thus, the key in our payload is “ndarray.” Here’s how we orchestrate this:

import requests
import json

URL = "http://localhost:9000/api/v1.0/predictions"

def send_prediction_request(data):
    
    # Create the headers for the request
    headers = {'Content-Type': 'application/json'}

    try:
        # Send the POST request
        response = requests.post(URL, headers=headers, json=data)
        
        # Check if the request was successful
        response.raise_for_status() # Will raise HTTPError if the HTTP request returned an unsuccessful status code
        
        # If successful, return the JSON data
        return response.json()
    except requests.ConnectionError:
        raise Exception("Failed to connect to the server. Is it running?")
    except requests.Timeout:
        raise Exception("Request timed out. Please try again later.")
    except requests.RequestException as err:
        # For any other requests exceptions, re-raise it
        raise Exception(f"An error occurred with your request: {err}")

X_test 

# Define the data payload (We can also use X_test[0:1].tolist() instead of the raw array)
data_payload = {
    "data": {
        "ndarray": [
            [
                1.340e+01, 2.052e+01, 8.864e+01, 5.567e+02, 1.106e-01, 1.469e-01,
                1.445e-01, 8.172e-02, 2.116e-01, 7.325e-02, 3.906e-01, 9.306e-01,
                3.093e+00, 3.367e+01, 5.414e-03, 2.265e-02, 3.452e-02, 1.334e-02,
                1.705e-02, 4.005e-03, 1.641e+01, 2.966e+01, 1.133e+02, 8.444e+02,
                1.574e-01, 3.856e-01, 5.106e-01, 2.051e-01, 3.585e-01, 1.109e-01
            ]
        ]
    }
}


# Get the response and print it
try:
    response = send_prediction_request(data_payload)
    pretty_json_response = json.dumps(response, indent=4)  
    print(pretty_json_response)
except Exception as err:
    print(err)

The prediction of our model will be similar to this dictionary:

{
    "data": {
        "names": [],
        "ndarray": [
            0
        ]
    },
    "meta": {
        "metrics": [
            {
                "key": "accuracy",
                "type": "GAUGE",
                "value": 0
            },
            {
                "key": "precision",
                "type": "GAUGE",
                "value": 0
            },
            {
                "key": "recall",
                "type": "GAUGE",
                "value": 0
            },
            {
                "key": "f1_score",
                "type": "GAUGE",
                "value": 0
            }
        ],
        "tags": {
            "shape_input_data": "(1, 30)"
        }
    }
}

The response from the model will contain several keys:

  • "data": Provides the generated output by our model. In our case, it’s the predicted class.
  • "meta": Contains metadata and model metrics. It shows the actual values of the classification metrics, including accuracy, precision, recall, and f1_score.
  • "tags": Contains intermediate metadata. This could include anything you want to track, such as the shape of the input data.

The structure outlined above ensures that not only can we evaluate the final predictions, but we also gain insights into intermediate results. These insights can be instrumental in understanding predictions and debugging any potential issues.

This stage marks a significant milestone in our journey from training a model to deploying and testing it within a Docker container. We’ve seen how to standardize an ML model and how to set it up for real-world predictions. With this foundation, you’re well-equipped to scale, monitor, and further integrate this model into a full-fledged production environment.

Send feedback in real-time and calculate metrics

The provisioned /feedback endpoint facilitates this learning by allowing truth values to be sent back to the model once they are available. As these truth values are received, the model’s metrics are updated and can be scraped by other tools for real-time analysis and monitoring. In the following code snippet, we iterate over the test dataset and send the truth value to the /feedback endpoint, using a POST request:

import requests
import json

URL = "http://localhost:9000/api/v1.0/feedback"

def send_prediction_feedback(data):
    
    # Create the headers for the request
    headers = {'Content-Type': 'application/json'}

    try:
        # Send the POST request
        response = requests.post(URL, headers=headers, json=data)
        
        # Check if the request was successful
        response.raise_for_status() # Will raise HTTPError if the HTTP request returned an unsuccessful status code
        
        # If successful, return the JSON data
        return response.json()
    except requests.ConnectionError:
        raise Exception("Failed to connect to the server. Is it running?")
    except requests.Timeout:
        raise Exception("Request timed out. Please try again later.")
    except requests.RequestException as err:
        # For any other requests exceptions, re-raise it
        raise Exception(f"An error occurred with your request: {err}")



for i in range(len(X_test)):
    payload = {'request': {'data': {'ndarray': [X_test[i].tolist()]}}, 'truth': {'data': {'ndarray': [int(y_test[i])]}}}

    # Get the response and print it
    try:
        response = send_prediction_feedback(payload)
        pretty_json_response = json.dumps(response, indent=4)  # Pretty-print JSON
        print(pretty_json_response)
    except Exception as err:
        print(err)

After processing the feedback, the model calculates and returns key metrics, including accuracy, precision, recall, and F1-score. These metrics are then available for analysis:

{
    "data": {
        "ndarray": []
    },
    "meta": {
        "metrics": [
            {
                "key": "accuracy",
                "type": "GAUGE",
                "value": 0.92607003
            },
            {
                "key": "precision",
                "type": "GAUGE",
                "value": 0.9528302
            },
            {
                "key": "recall",
                "type": "GAUGE",
                "value": 0.9294478
            },
            {
                "key": "f1_score",
                "type": "GAUGE",
                "value": 0.9409938
            }
        ],
        "tags": {
            "shape_input_data": "(1, 30)"Ω
        }
    }
}

What makes this approach truly powerful is that the model’s evolution is no longer confined to the training phase. Instead, it’s in a continual state of learning, adjustment, and refinement, based on real-world feedback.

This way, we’re not just deploying a static prediction engine but fostering an evolving intelligent system that can better align itself with the changing landscape of data it interprets. It’s a holistic approach to machine learning deployment that encourages continuous improvement and real-time adaptation.

Conclusions

At IKEA Retail, Docker has become an indispensable element in our daily MLOps activities, serving as a catalyst that accelerates the development and deployment of models, especially when transitioning to production. The transformative impact of Docker unfolds through a spectrum of benefits that not only streamlines our workflow but also fortifies it:

  • Standardization: Docker orchestrates a consistent environment during the development and deployment of any ML model, fostering uniformity and coherence across the lifecycle.
  • Compatibility: With support for diverse environments and seamless multi-cloud or on-premise integration, Docker bridges gaps and ensures a harmonious workflow.
  • Isolation: Docker ensures that applications and resources are segregated, offering an isolated environment that prioritizes efficiency and integrity.
  • Security: Beyond mere isolation, Docker amplifies security by completely segregating applications from each other. This robust separation empowers us with precise control over traffic flow and management, laying a strong foundation of trust.

These attributes translate into tangible advantages in our MLOps journey, sculpting a landscape that’s not only innovative but also robust:

  • Agile development and deployment environment: Docker ignites a highly responsive development and deployment environment, enabling seamless creation, updating, and deployment of ML models.
  • Optimized resource utilization: Utilize compute/GPU resources efficiently within a shared model, maximizing performance without compromising flexibility.
  • Scalable deployment: Docker’s architecture allows for the scalable deployment of ML models, adapting effortlessly to growing demands.
  • Smooth release cycles: Integrating seamlessly with our existing CI/CD pipelines, Docker smoothens the model release cycle, ensuring a continuous flow of innovation.
  • Effortless integration with monitoring tools: Docker’s compatibility extends to monitoring stacks like Prometheus + Grafana, creating a cohesive ecosystem fully aligned with our MLOps approach when creating and deploying models in production.

The convergence of these benefits elevates IKEA Retail’s MLOps strategy, transforming it into a symphony of efficiency, security, and innovation. Docker is not merely a tool: Docker is a philosophy that resonates with our pursuit of excellence. Docker is the bridge that connects creativity with reality, and innovation with execution.

In the complex world of ML deployment, we’ve explored a path less trodden but profoundly rewarding. We’ve tapped into the transformative power of standardization, unlocking an agile and responsive way to deploy and engage with ML models in real-time.

But this is not a conclusion; it’s a threshold. New landscapes beckon, brimming with opportunities for growth, exploration, and innovation. The following steps will continue the current approach: 

  • Scaling with Kubernetes: Unleash the colossal potential of Kubernetes, a beacon of flexibility and resilience, guiding you to a horizon of unbounded possibilities.
  • Applying real-time monitoring and alerting systems based on open source technologies, such as Prometheus and Grafana.
  • Connecting a data-drift detector for real-time detection: Deployment and integration of drift detectors to detect changes in data in real-time.

We hope this exploration will empower you to redefine your paths, ignite new ideas, and push the boundaries of what’s possible. The gateway to an extraordinary future is open, and the key is in our hands.

Learn more

]]>
How to Use the Node Docker Official Image https://www.docker.com/blog/how-to-use-the-node-docker-official-image/ Wed, 26 Oct 2022 14:04:08 +0000 https://www.docker.com/?p=38370 Topping Stack Overflow’s 2022 list of most popular web frameworks and technologies, Node.js continues to grow as a critical MERN stack component. And since Node applications are written in JavaScript — the world’s leading programming language — many developers will feel right at home using it. We introduced the Node Docker Official Image (DOI) due to Node.js’ popularity and to solve some common development challenges. 

The Node.js Foundation describes Node as “an open-source, cross-platform JavaScript runtime environment.” Developers use it to create performant, scalable server and networking applications. Despite Node’s advantages, building and deploying cross-platform services can be challenging with traditional workflows.

Conversely, the Node Docker Official Image accelerates and simplifies your development processes while allowing additional configuration. You can deploy containerized Node applications in minutes. Throughout this guide, we’ll discuss the Node Official Image, how to use it, and some valuable best practices. 

In this tutorial:

What is the Node Docker Official Image?

node js docker official image blog 900x600 1

The Node Docker Official Image contains all source code, core dependencies, tools, and libraries your application needs to work correctly. 

This image supports multiple CPU architectures like amd64, arm32v6, arm32v7, arm64v8, ppc641le, and s390x. You can also choose between multiple tags (or image versions) for any project. Choosing a pinned version like node:19.0.0-slim locks you into a stable, streamlined version of Node.js. 

Node.js use cases

Node.js lets developers write server-side code in JavaScript. The runtime environment then transforms this JavaScript into hardware-friendly machine code. As a result, the CPU can process these low-level instructions. 

Node is event-driven (through user actions), non-blocking, and known for being lightweight while simultaneously handling numerous operations. As a result, you can use the Node DOI to create the following: 

  • Web server applications
  • Networking applications

Node works well here because it supports HTTP requests and socket connections. An asynchronous I/O library lets Node containers read and write various system files that support applications. 

You could use the Node DOI to build streaming apps, single-page applications, chat apps, to-do list apps, and microservices. Or — if you’re like Community All-Hands’ Kathleen Juell — you could use Node.js to help serve static content. Containerized Node will shine in any scenario dictated by numerous client-server requests. 

Docker Captain Bret Fisher also offered his thoughts on Dockerized Node.js during DockerCon 2022. He discussed best practices for managing Node.js projects while diving into optimization. 

Lastly, we also maintain some Node sample applications within our GitHub Awesome Compose library. You can learn to use Node with different databases or even incorporate an NGINX proxy. 

About Docker Official Images

We’ve curated the Node Docker Official Image as one of many core container images on Docker Hub. The Node.js community maintains this image alongside members of the Docker community. 

Like other Docker Official Images, the Node DOI offers a common starting point for Node and JavaScript developers. We also maintain an evolving list of Node best practices while regularly pushing critical security updates. This distinguishes Docker Official Images from alternatives on Docker Hub. 

How to run Node in Docker

Before getting started, download the latest Docker Desktop release and install it. Docker Desktop includes the Docker CLI, Docker Compose, and additional core development tools. The Docker Dashboard (Docker Desktop’s UI component) will help you manage images and containers. 

You’re then ready to Dockerize Node!

Enter a quick pull command

Pulling the Node DOI is the quickest way to begin. Enter docker pull node in your terminal to grab the default latest Node version from Docker Hub. You can readily use this tag for testing or local development. But, a pinned version might be safer for production use. Here’s how the pull process works: 

Your CLI will display a status message once it’s done. You can also double-check this within Docker Desktop! Click the Images tab on the left sidebar and scan through your listed images. Docker Desktop will display your node image:

Docker UI listing local images, including the Node Docker Official Image..

Your node:latest image is a hefty 942.33 MB. If you inspect your Node image’s contents using docker sbom node, you’ll see that it currently includes 623 packages. The Node image contains numerous dependencies and modules that support Node and various applications. 

However, your final Node image can be much slimmer! We’ll tackle optimization while discussing Dockerfiles. After all, the Node DOI has 24 supported tags spread amongst four major Node versions. Each has its own impact on image size.  

Confirm that Node is functional

Want to run your new image as a container? Hover over your listed node image and click the blue “Run” button. In this state, your Node container will produce some minimal log entries and run continuously in case requests come through. 

Exit this container before moving on by clicking the square “stop” button in Docker Desktop or by entering docker stop YourContainerName in the CLI. 

Create your Node image from a Dockerfile

Building from a Dockerfile gives you ultimate control over image composition, configuration, and your overall application. However, Node requires very little to function properly. Here’s a barebones Dockerfile to get you up and running (using a pinned, Debian-based image version): 

FROM node:19-bullseye

Docker will build your image from your chosen Node version. 

It’s safest to use node:19-bullseye because this image supports numerous use cases. This version is also stable and prevents you from pulling in new breaking changes, which sometimes happens with latest tags. 

To build your image from a Dockerfile, run the docker build -t my-nodejs-app . command. You can then run your new image by entering docker run -it --rm --name my-running-app my-nodejs-app.

Optimize your Node image

The complete version of Node often includes extra packages that weigh your application down. This leaves plenty of room for optimization. 

For example, removing unneeded development dependencies reduces image bloat. You can do this by adding a RUN instruction to our previous file: 

FROM node:19-bullseye

RUN npm prune --production

This approach is pretty granular. It also relies on you knowing exactly what you do and don’t need for your project. Alternatively, switching to a slim image build offers the quickest results. You’ll encounter similar caveats but spend less time writing individual Dockerfile instructions. The easiest approach is to replace node:19-bullseye with its node:19-bullseye-slim counterpart. This alone shrinks image size by 75%. 

You can even pull node:19-alpine to save more disk space. However, this tag contains even fewer dependencies and isn’t officially supported by the Node.js Foundation. Keep this in mind while developing. 

Finally, multi-stage builds lead to smaller image sizes. These let you copy only what you need between build stages to combat bloat. 

Using Docker Compose

Say you have a start script, an existing package.json file, and (possibly) want to operate Node alongside other services. Spinning up Node containers with Docker Compose can be pretty handy in these situations.

Here’s a sample docker-compose.yml file: 

services:
  node:
    image: "node:19-bullseye"
    user: "node"
    working_dir: /home/node/app
    environment:
      - NODE_ENV=production
    volumes:
      - ./:/home/node/app
    ports:
      - "8888:8888"
    command: "npm start"

You’ll see some parameters that we didn’t specify earlier in our Dockerfile. For example, the user parameter lets you run your container as an unprivileged user. This follows the principle of least privilege. 

To jumpstart your Node container, simply enter the docker compose up -d command. Like before, you can verify that Node is running within Docker Desktop. The docker container ls --all command also displays all existing containers within the CLI.  

Running a simple Node script

Your project doesn’t always need a  Dockerfile. In these cases, you can directly leverage the Node DOI with the following command: 

docker run -it --rm --name my-running-script -v "$PWD":/usr/src/app -w /usr/src/app node:19-bullseye node your-daemon-or-script.js

This simplistic approach is ideal for single-file projects.

Docker Node best practices

It’s important to get the most out of Docker and the Node Official Image. We’ve briefly outlined the benefits of running as a non-root node user, but here are some useful tips for developing with Node: 

  • Easily pass secrets and other runtime configurations to your application by setting NODE_ENV to production, as seen here: -e “NODE_ENV=production”.
  • Place any installed, global Node dependencies into a non-root user directory.
  • Remember to manually install curl if using an alpine image tag, since it’s not included by default.
  • Wrap your Node process in an init system with the --init flag, so it can successfully run as PID1. 
  • Set memory limitations for your containers that run on the same host. 
  • Include the package.json start command directly within your Dockerfile, to reduce active container processes and let Node properly receive exit signals. 

This isn’t an exhaustive list. To view more details, check out our best practices documentation.

Get started with Node today

As you’ve seen, spinning up a Node container from the Node Docker Official Image is quick and requires just a few steps depending on your workflow. You’ll no longer need to worry about platform-specific builds or get bogged down with complex development processes. 

We’ve also covered many ways to help your Node builds perform better. Check out our top containerization tips article to learn even more about optimization and security. 

Ready to get started? Swing by Docker Hub and pull our Node image to start experimenting. In no time, you’ll have your server and networking applications up and running. You can also learn more on our GitHub read.me page.

]]>
Docker images Archives | Docker nonadult
Resolve Vulnerabilities Sooner With Contextual Data https://www.docker.com/blog/resolve-vulnerabilities-sooner-with-contextual-data/ Tue, 25 Oct 2022 20:48:06 +0000 https://www.docker.com/?p=38342 OpenSSL 3.0.7 and “Text4Shell” might be the most recent critical vulnerabilities to plague your development team, but they won’t be the last. In 2021, critical vulnerabilities reached a record high. Attackers are even reusing their work, with over 50% of zero-day attacks this year being variants of previously-patched vulnerabilities

With each new security vulnerability, we’re forced to re-examine our current systems and processes. If you’re impacted by OpenSSL or Text4Shell (aka CVE-2022-42889), you’ve probably asked yourself, “Are we using Apache Commons Text (and where)?” or “Is it a vulnerable version?” — and similar questions. And if you’re packaging applications into container images and running those on cloud infrastructure, then a breakdown by image, deployment environment, and impacted Commons-Text version would be extremely useful. 

Developers need contextual data to help cut through the noise and answer these questions, but gathering information takes time and significantly impacts productivity. An entire day is derailed if developers must context switch and spend countless hours researching, triaging, and fixing these issues. So, how do we stop these disruptions and surface crucial data in a more accessible way for developers?

Start with continuously examining images

Bugs, misconfigurations, and vulnerabilities don’t stop once an image is pushed to production, and neither should development. Improving images is a continuous effort that requires a constant flow of information before, during, and after development.

Before images are used, teams spend a significant amount of time vetting and selecting them. That same amount of effort needs to be put into continuously inspecting those same images. Otherwise, you’ll find yourself in a reactive cycle of unnecessary rework, wasted time, and overall developer frustration.

That’s where contextual data comes in. Contextual data ties directly to the situation around it to give developers a broader understanding. As an example, contextual data for vulnerabilities gives you clear and precise insights to understand what the vulnerability is, how urgent it is, and its specific impact on the developer and the application architecture — whether local, staging, or production.

Contextual data reduces noise and helps the developer know the what and the where so they can prioritize making the correct changes in the most efficient way. What does contextual data look like? It can be…

  • A comparison of detected vulnerabilities between images built from a PR branch with the image version currently running in production
  • A comparison between images that use the same custom base image
  • An alert sent into a Slack channel that’s connected to a GitHub repository when a new critical or high CVE is detected in an image currently running in production
  • An alert or pull request to update to a newer version of your base image to remediate a certain CVE

Contextual data makes it faster for developers to locate and remediate the vulnerabilities in their application.

Use Docker to surface contextual data

Contextual data is about providing more information that’s relevant to developers in their daily tasks. How does it work?

Docker can index and analyze public and private images within your registries to provide insights about the quality of your images. For example, you can get open source package updates, receive alerts about new vulnerabilities as security researchers discover them, send updates to refresh outdated base images, and be informed about accidentally embedded secrets like access tokens. 

The screenshot below shows what appears to be a very common list of vulnerabilities on a select Docker image. But there’s a lot more data on this page that correlates to the image:

  • The page breaks the vulnerabilities up by layers and base images making it easy to assess where to apply a fix for a detected vulnerability.
  • Image refs in the right column highlight that this version of the image is currently running in production.
  • We also see that this image represents the current head commit in the corresponding Git repository and we can see which Dockerfile it was built from.
  • The current and potential other base images are listed for comparison.
Image CVE Report 1
An image report with a list of common CVEs — including Text4Shell

Using Slack, notifications are sent to the channels your team already uses. Below shows an alert sent into a Slack channel that’s configured to show activity for a selected set of Git repositories. Besides activity like commits, CI builds, and deployments, you can see the Text4Shell alert providing very concise and actionable information to developers collaborating in this channel:

Slack Text4Shell Update 2
Slack update on the critical Text4Shell vulnerability

You can also get suggestions to remediate certain categories of vulnerabilities and raise pull requests to update vulnerable packages like those in the following screenshot:

Text4Shell Remediation PR 1
Remediating the Text4Shell CVE via a PR and comparing to main branch

Find out more about this type of information for public images like Docker Official Images or Docker Verified Publisher images using our Image Vulnerability Database.

Vulnerability remediation is just the beginning

Contextual data is essential for faster resolution of vulnerabilities, but it’s more than that. With the right data at the right time, developers are able to work faster and spend their time innovating instead of drowning in security tickets.

Imagine you could assess your production images today to find out where you’re potentially going to be vulnerable. Your teams could have days or weeks to prepare to remediate the next critical vulnerability, like the OpenSSL forthcoming notification on a new critical CVE next Tuesday, November 1st 2022.

Docker DSO Debian Search 1
Searching for Debian OpenSSL on dso.docker.com

Interested in getting these types of insights and learning more about providing contextual data for happier, more productive devs? Sign up for our Early Access Program to harness these tools and provide invaluable feedback to help us improve our product!

]]>
How to Use the Alpine Docker Official Image https://www.docker.com/blog/how-to-use-the-alpine-docker-official-image/ Thu, 08 Sep 2022 14:00:00 +0000 https://www.docker.com/?p=37364 With its container-friendly design, the Alpine Docker Official Image (DOI) helps developers build and deploy lightweight, cross-platform applications. It’s based on Alpine Linux which debuted in 2005, making it one of today’s newest major Linux distros. 

While some developers express security concerns when using relatively newer images, Alpine has earned a solid reputation. Developers favor Alpine for the following reasons:  

In fact, the Alpine DOI is one of our most popular container images on Docker Hub. To help you get started, we’ll discuss this image in greater detail and how to use the Alpine Docker Official Image with your next project. Plus, we’ll explore using Alpine to grab the slimmest image possible. Let’s dive in!

In this tutorial:

What is the Alpine Docker Official Image?

how to use the alpine docker official image 900x600 1

The Alpine DOI is a building block for Alpine Linux Docker containers. It’s an executable software package that tells Docker and your application how to behave. The image includes source code, libraries, tools, and other core dependencies that your application needs. These components help Alpine Linux function while enabling developer-centric features. 

The Alpine Docker Official Image differs from other Linux-based images in a few ways. First, Alpine is based on the musl libc implementation of the C standard library — and uses BusyBox instead of GNU coreutils. While GNU packages many Linux-friendly programs together, BusyBox bundles a smaller number of core functions within one executable. 

While our Ubuntu and Debian images leverage glibc and coreutils, these alternatives are comparatively lightweight and resource-friendly, containing fewer extensions and less bloat.

As a result, Alpine appeals to developers who don’t need uncompromising compatibility or functionality from their image. Our Alpine DOI is also user-friendly and straightforward since there are fewer moving parts.

Alpine Linux performs well on resource-limited devices, which is fitting for developing simple applications or spinning up servers. Your containers will consume less RAM and less storage space. 

The Alpine Docker Official Image also offers the following features:

Multi-arch support lets you run Alpine on desktops, mobile devices, rack-mounted servers, Raspberry Pis, and even newer M-series Macs. Overall, Alpine pairs well with a wide variety of embedded systems. 

These are only some of the advantages to using the Alpine DOI. Next, we’ll cover how to harness the image for your application. 

When to use Alpine

You may be interested in using Alpine, but find yourself asking, “When should I use it?” Containerized Alpine shines in some key areas: 

  • Creating servers
  • Router-based networking
  • Development/testing environments

While there are some other uses for Alpine, most projects will fall under these two categories. Overall, our Alpine container image excels in situations where space savings and security are critical. 

How to run Alpine in Docker

Before getting started, download Docker Desktop and then install it. Docker Desktop is built upon Docker Engine and bundles together the Docker CLI, Docker Compose, and other core components. Launching Docker Desktop also lets you use Docker CLI commands (which we’ll get into later). Finally, the included Docker Dashboard will help you visually manage your images and containers. 

After completing these steps, you’re ready to Dockerize Alpine!

Note: For Linux users, Docker will still work perfectly fine if you have it installed externally on a server, or through your distro’s package manager. However, Docker Desktop for Linux does save time and effort by bundling all necessary components together — while aiding productivity through its user-friendly GUI. 

Use a quick pull command

You’ll have to first pull the Alpine Docker Official Image before using it for your project. The fastest method involves running docker pull alpine from your terminal. This grabs the alpine:latest image (the most current available version) from Docker Hub and downloads it locally on your machine: 

Your terminal output should show when your pull is complete — and which alpine version you’ve downloaded. You can also confirm this within Docker Desktop. Navigate to the Images tab from the left sidebar. And a list of downloaded images will populate on the right. You’ll see your alpine image, tag, and its minuscule (yes, you saw that right) 5.29 MB size:

Docker Desktop UI with list of downloaded images including Alpine.
Other Linux distro images like Ubuntu, Debian, and Fedora are many, many times larger than Alpine.

That’s a quick introduction to using the Alpine Official Image alongside Docker Desktop. But it’s important to remember that every Alpine DOI version originates from a Dockerfile. This plain-text file contains instructions that tell Docker how to build an image layer by layer. Check out the Alpine Linux GitHub repository for more Dockerfile examples. 

Next up, we’ll cover the significance of these Dockerfiles to Alpine Linux, some CLI-based workflows, and other key information.

Build your Dockerfile

Because Alpine is a standard base for container images, we recommend building on top of it within a Dockerfile. Specify your preferred alpine image tag and add instructions to create this file. Our example takes alpine:3.14 and runs an executable mysql client with it: 

FROM alpine:3.14
RUN apk add --no-cache mysql-client
ENTRYPOINT ["mysql"]

In this case, we’re starting from a slim base image and adding our mysql-client using Alpine’s standard package manager. Overall, this lets us run commands against our MySQL database from within our application. 

This is just one of the many ways to get your Alpine DOI up and running. In particular, Alpine is well-suited to server builds. To see this in action, check out Kathleen Juell’s presentation on serving static content with Docker Compose, Next.js, and NGINX. Navigate to timestamp 7:07 within the embedded video. 

The Alpine Official Image has a close relationship with other technologies (something that other images lack). Many of our Docker Official Images support -alpine tags. For instance, our earlier example of serving static content leverages the node:16-alpine image as a builder

This relationship makes Alpine and multi-stage builds an ideal pairing. Since the primary goal of a multi-stage build is to reduce your final image size, we recommend starting with one of the slimmest Docker Official Images.

Grabbing the slimmest possible image

Pulling an -alpine version of a given image typically yields the slimmest result. You can do this using our earlier docker pull [image] command. Or you can create a Dockerfile and specify this image version — while leaving room for customization with added instructions. 

In either case, here are some results using a few of our most popular images. You can see how image sizes change with these tags:

Image tagImage sizeimage:[version number]-alpine size
python:3.9.13867.66 MB46.71 MB
node:18.8.0939.71 MB164.38 MB
nginx:1.23.1134.51 MB22.13 MB

We’ve used the :latest tag since this is the default image tag Docker grabs from Docker Hub. As shown above with Python, pulling the -alpine image version reduces its footprint by nearly 95%! 

From here, the build process (when working from a Dockerfile) becomes much faster. Applications based on slimmer images spin up quicker. You’ll also notice that docker pull and various docker run commands execute swifter with -alpine images. 

However, remember that you’ll likely have to use this tag with a specified version number for your parent image. Running docker pull python-alpine or docker pull python:latest-alpine won’t work. Docker will alert you that the image isn’t found, the repo doesn’t exist, the command is invalid, or login information is required. This applies to any image. 

Get up and running with Alpine today

The Alpine Docker Official Image shines thanks to its simplicity and small size. It’s a fantastic base image — perhaps the most popular amongst Docker users — and offers plenty of room for customization. Alpine is arguably the most user-friendly, containerized Linux distro. We’ve tackled how to use the Alpine Official Image, and showed you how to get the most from it. 

Want to use Alpine for your next application or server? Pull the Alpine Official Image today to jumpstart your build process. You can also learn more about supported tags on Docker Hub. 

Additional resources

]]>
Docker images Archives | Docker nonadult
How to Colorize Black & White Pictures With OpenVINO on Ubuntu Containers https://www.docker.com/blog/how-to-colorize-black-white-pictures-ubuntu-containers/ Fri, 02 Sep 2022 14:00:00 +0000 https://www.docker.com/?p=36935 If you’re looking to bring a stack of old family photos back to life, check out Ubuntu’s demo on how to use OpenVINO on Ubuntu containers to colorize monochrome pictures. This magical use of containers, neural networks, and Kubernetes is packed with helpful resources and a fun way to dive into deep learning!

A version of Part 1 and Part 2 of this article was first published on Ubuntu’s blog.

Ubuntu and intel repost 900x600 1

Table of contents:

OpenVINO on Ubuntu containers: making developers’ lives easier

Suppose you’re curious about AI/ML and what you can do with OpenVINO on Ubuntu containers. In that case, this blog is an excellent read for you too.

Docker image security isn’t only about provenance and supply chains; it’s also about the user experience. More specifically, the developer experience.

Removing toil and friction from your app development, containerization, and deployment processes avoids encouraging developers to use untrusted sources or bad practices in the name of getting things done. As AI/ML development often requires complex dependencies, it’s the perfect proof point for secure and stable container images.

Why Ubuntu Docker images?

As the most popular container image in its category, the Ubuntu base image provides a seamless, easy-to-set-up experience. From public cloud hosts to IoT devices, the Ubuntu experience is consistent and loved by developers.

One of the main reasons for adopting Ubuntu-based container images is the software ecosystem. More than 30,000 packages are available in one “install” command, with the option to subscribe to enterprise support from Canonical. It just makes things easier.

In this blog, you’ll see that using Ubuntu Docker images greatly simplifies component containerization. We even used a prebuilt & preconfigured container image for the NGINX web server from the LTS images portfolio maintained by Canonical for up to 10 years.

Beyond providing a secure, stable, and consistent experience across container images, Ubuntu is a safe choice from bare metal servers to containers. Additionally, it comes with hardware optimization on clouds and on-premises, including Intel hardware.

Why OpenVINO?

When you’re ready to deploy deep learning inference in production, binary size and memory footprint are key considerations – especially when deploying at the edge. OpenVINO provides a lightweight Inference Engine with a binary size of just over 40MB for CPU-based inference. It also provides a Model Server for serving models at scale and managing deployments.

OpenVINO includes open-source developer tools to improve model inference performance. The first step is to convert a deep learning model (trained with TensorFlow, PyTorch,…) to an Intermediate Representation (IR) using the Model Optimizer. In fact, it cuts the model’s memory usage in half by converting it from FP32 to FP16 precision. You can unlock additional performance by using low-precision tools from OpenVINO. The Post-training Optimisation Tool (POT) and Neural Network Compression Framework (NNCF) provide quantization, binarisation, filter pruning, and sparsity algorithms. As a result, Intel devices’ throughput increases on CPUs, integrated GPUs, VPUs, and other accelerators.

Open Model Zoo provides pre-trained models that work for real-world use cases to get you started quickly. Additionally, Python and C++ sample codes demonstrate how to interact with the model. More than 280 pre-trained models are available to download, from speech recognition to natural language processing and computer vision.

For this blog series, we will use the pre-trained colorization models from Open Model Zoo and serve them with Model Server.

colorize example albert einstein sticks tongue out

OpenVINO and Ubuntu container images

The Model Server – by default – ships with the latest Ubuntu LTS, providing a consistent development environment and an easy-to-layer base image. The OpenVINO tools are also available as prebuilt development and runtime container images.

To learn more about Canonical LTS Docker Images and OpenVINO™, read:

Neural networks to colorize a black & white image

Now, back to the matter at hand: how will we colorize grandma and grandpa’s old pictures? Thanks to Open Model Zoo, we won’t have to train a neural network ourselves and will only focus on the deployment. (You can still read about it.)

architecture diagram colorizer demo app microk8s
Architecture diagram of the colorizer demo app running on MicroK8s

Our architecture consists of three microservices: a backend, a frontend, and the OpenVINO Model Server (OVMS) to serve the neural network predictions. The Model Server component hosts two different demonstration neural networks to compare their results (V1 and V2). These components all use the Ubuntu base image for a consistent software ecosystem and containerized environment.

A few reads if you’re not familiar with this type of microservices architecture:

gRPC vs REST APIs

The OpenVINO Model Server provides inference as a service via HTTP/REST and gRPC endpoints for serving models in OpenVINO IR or ONNX format. It also offers centralized model management to serve multiple different models or different versions of the same model and model pipelines.

The server offers two sets of APIs to interface with it: REST and gRPC. Both APIs are compatible with TensorFlow Serving and expose endpoints for prediction, checking model metadata, and monitoring model status. For use cases where low latency and high throughput are needed, you’ll probably want to interact with the model server via the gRPC API. Indeed, it introduces a significantly smaller overhead than REST. (Read more about gRPC.)

OpenVINO Model Server is distributed as a Docker image with minimal dependencies. For this demo, we will use the Model Server container image deployed to a MicroK8s cluster. This combination of lightweight technologies is suitable for small deployments. It suits edge computing devices, performing inferences where the data is being produced – for increased privacy, low latency, and low network usage.

Ubuntu minimal container images

Since 2019, the Ubuntu base images have been minimal, with no “slim” flavors. While there’s room for improvement (keep posted), the Ubuntu Docker image is a less than 30MB download, making it one of the tiniest Linux distributions available on containers.

In terms of Docker image security, size is one thing, and reducing the attack surface is a fair investment. However, as is often the case, size isn’t everything. In fact, maintenance is the most critical aspect. The Ubuntu base image, with its rich and active software ecosystem community, is usually a safer bet than smaller distributions.

A common trap is to start smaller and install loads of dependencies from many different sources. The end result will have poor performance, use non-optimized dependencies, and not be secure. You probably don’t want to end up effectively maintaining your own Linux distribution … So, let us do it for you.

colorize example cat walking through grass
“What are you looking at?” (original picture source)

Demo architecture

“As a user, I can drag and drop black and white pictures to the browser so that it displays their ready-to-download colorized version.” – said the PM (me).

For that – replied the one-time software engineer (still me) – we only need:

  • A fancy yet lightweight frontend component.
  • OpenVINO™ Model Server to serve the neural network colorization predictions.
  • A very light backend component.

Whilst we could target the Model Server directly with the frontend (it exposes a REST API), we need to apply transformations to the submitted image. The colorization models, in fact, each expect a specific input.

Finally, we’ll deploy these three services with Kubernetes because … well … because it’s groovy. And if you think otherwise (everyone is allowed to have a flaw or two), you’ll find a fully functional docker-compose.yaml in the source code repository.

architecture diagram for demo app
Architecture diagram for the demo app (originally colored tomatoes)

In the upcoming sections, we will first look at each component and then show how to deploy them with Kubernetes using MicroK8s. Don’t worry; the full source code is freely available, and I’ll link you to the relevant parts.

Neural network – OpenVINO Model Server

The colorization neural network is published under the BSD 2-clause License, accessible from the Open Model Zoo. It’s pre-trained, so we don’t need to understand it in order to use it. However, let’s look closer to understand what input it expects. I also strongly encourage you to read the original work from Richard Zhang, Phillip Isola, and Alexei A. Efros. They made the approach super accessible and understandable on this website and in the original paper.

neural network architecture
Neural network architecture (from arXiv:1603.08511 [cs.CV])

As you can see on the network architecture diagram, the neural network uses an unusual color space: LAB. There are many 3-dimensional spaces to code colors: RGB, HSL, HSV, etc. The LAB format is relevant here as it fully isolates the color information from the lightness information. Therefore, a grayscale image can be coded with only the L (for Lightness) axis. We will send only the L axis to the neural network’s input. It will generate predictions for the colors coded on the two remaining axes: A and B.

From the architecture diagram, we can also see that the model expects a 256×256 pixels input size. For these reasons, we cannot just send our RGB-coded grayscale picture in its original size to the network. We need to first transform it.

We compare the results of two different model versions for the demo. Let them be called ‘V1’ (Siggraph) and ‘V2’. The models are served with the same instance of the OpenVINO™ Model Server as two different models. (We could also have done it with two different versions of the same model – read more in the documentation.)

Finally, to build the Docker image, we use the first stage from the Ubuntu-based development kit to download and convert the model. We then rebase on the more lightweight Model Server image.

# Dockerfile
FROM openvino/ubuntu20_dev:latest AS omz
# download and convert the model
…
FROM openvino/model_server:latest
# copy the model files and configure the Model Server
…

Backend – Ubuntu-based Flask app (Python)

For the backend microservice that interfaces between the user-facing frontend and the Model Server hosting the neural network, we chose to use Python. There are many valuable libraries to manipulate data, including images, specifically for machine learning applications. To provide web serving capabilities, Flask is an easy choice.

The backend takes an HTTP POST request with the to-be-colorized picture. It synchronously returns the colorized result using the neural network predictions. In between – as we’ve just seen – it needs to convert the input to match the model architecture and to prepare the output to show a displayable result.

Here’s what the transformation pipeline looks like on the input:

transformation pipline on input

And the output looks something like that:

transformation pipline on output

To containerize our Python Flask application, we use the first stage with all the development dependencies to prepare our execution environment. We copy it onto a fresh Ubuntu base image to run it, configuring the model server’s gRPC connection.

Frontend – Ubuntu-based NGINX container and Svelte app

Finally, I put together a fancy UI for you to try the solution out. It’s an effortless single-page application with a file input field. It can display side-by-side the results from the two different colorization models.

I used Svelte to build the demo as a dynamic frontend. Below each colorization result, there’s even a saturation slider (using a CSS transformation) so that you can emphasize the predicted colors and better compare the before and after.

To ship this frontend application, we again use a Docker image. We first build the application using the Node base image. We then rebase it on top of the preconfigured NGINX LTS image maintained by Canonical. A reverse proxy on the frontend side serves as a passthrough to the backend on the /API endpoint to simplify the deployment configuration. We do that directly in an NGINX.conf configuration file copied to the NGINX templates directory. The container image is preconfigured to use these template files with environment variables.

Deployment with Kubernetes

I hope you had the time to scan some black and white pictures because things are about to get serious(ly colorized).

We’ll assume you already have a running Kubernetes installation from the next section. If not, I encourage you to run the following steps or go through this MicroK8s tutorial.

# https://microk8s.io/docs
sudo snap install microk8s --classic
 
# Add current user ($USER) to the microk8s group
sudo usermod -a -G microk8s $USER &amp;&amp; sudo chown -f -R $USER ~/.kube
newgrp microk8s
 
# Enable the DNS, Storage, and Registry addons required later
microk8s enable dns storage registry
 
# Wait for the cluster to be in a Ready state
microk8s status --wait-ready
 
# Create an alias to enable the `kubectl` command
sudo snap alias microk8s.kubectl kubectl
ubuntu command line kubernetes cluster

Yes, you deployed a Kubernetes cluster in about two command lines.

Build the components’ Docker images

Every component comes with a Dockerfile to build itself in a standard environment and ship its deployment dependencies (read What are containers for more information). They all create an Ubuntu-based Docker image for a consistent developer experience.

Before deploying our colorizer app with Kubernetes, we need to build and push the components’ images. They need to be hosted in a registry accessible from our Kubernetes cluster. We will use the built-in local registry with MicroK8s. Depending on your network bandwidth, building and pushing the images will take a few minutes or more.

sudo snap install docker
cd ~ &amp;&amp; git clone https://github.com/valentincanonical/colouriser-demo.git
 
# Backend
docker build backend -t localhost:32000/backend:latest
docker push localhost:32000/backend:latest
 
# Model Server
docker build modelserver -t localhost:32000/modelserver:latest
docker push localhost:32000/modelserver:latest
 
# Frontend
docker build frontend -t localhost:32000/frontend:latest
docker push localhost:32000/frontend:latest

Apply the Kubernetes configuration files

All the components are now ready for deployment. The Kubernetes configuration files are available as deployments and services YAML descriptors in the ./K8s folder of the demo repository. We can apply them all at once, in one command:

kubectl apply -f ./k8s

Give it a few minutes. You can watch the app being deployed with watch kubectl status. Of all the services, the frontend one has a specific NodePort configuration to make it publicly accessible by targeting the Node IP address.

ubuntu command line kubernetes configuration files

Once ready, you can access the demo app at http://localhost:30000/ (or replace localhost with a cluster node IP address if you’re using a remote cluster). Pick an image from your computer, and get it colorized!

All in all, the project was pretty easy considering the task we accomplished. Thanks to Ubuntu containers, building each component’s image with multi-stage builds was a consistent and straightforward experience. And thanks to OpenVINO™ and the Open Model Zoo, serving a pre-trained model with excellent inference performance was a simple task accessible to all developers.

That’s a wrap!

You didn’t even have to share your pics over the Internet to get it done. Thanks for reading this article; I hope you enjoyed it. Feel free to reach out on socials. I’ll leave you with the last colorization example.

colorized example christmas cat
Christmassy colorization example (original picture source)

To learn more about Ubuntu, the magic of Docker images, or even how to make your own Dockerfiles, see below for related resources:

]]>
Extending Docker’s Integration with containerd https://www.docker.com/blog/extending-docker-integration-with-containerd/ Thu, 01 Sep 2022 16:44:09 +0000 https://www.docker.com/?p=37231 We’re extending Docker’s integration with containerd to include image management! To share this work early and get feedback, this integration is available as an opt-in experimental feature with the latest Docker Desktop 4.12.0 release.

The Docker Desktop Experimental Features settings with the option for using containerd for pulling and storing images enabled.

What is containerd?

In the simplest terms, containerd is a broadly-adopted open container runtime. It manages the complete container lifecycle of its host system! This includes pulling and pushing images as well as handling the starting and stopping of containers. Not to mention, containerd is a low-level brick in the container experience. Rather than being used directly by developers, it’s designed to be embedded into systems like Docker and Kubernetes.

Docker’s involvement in the containerd project can be traced all the way back to 2016. You could say, it’s a bit of a passion project for us! While we had many reasons for starting the project, our goal was to move the container supervision out of the core Docker Engine and into a separate daemon. This way, it could be reused in projects like Kubernetes. It was donated to the Cloud Native Computing Foundation (CNCF), and it’s now a graduated (stable) project as of 2017.

What does containerd replace in the Docker Engine?

As we mentioned earlier, Docker has used containerd as part of Docker Engine for managing the container lifecycle (creating, starting, and stopping) for a while now! This new work is a step towards a deeper integration of containerd into the Docker Engine. It lets you use containerd to store images and then push and pull them. Containerd also uses snapshotters instead of graph drivers for mounting the root file system of a container. Due to containerd’s pluggable architecture, it can support multiple snapshotters as well. 

Want to learn more? Michael Crosby wrote a great explanation about snapshotters on the Moby Blog.

Why migrate to containerd for image management?

Containerd is the leading open container runtime and, better yet, it’s already a part of Docker Engine! By switching to containerd for image management, we’re better aligning ourselves with the broader industry tooling. 

This migration modifies two main things:

  • We’re replacing Docker’s graph drivers with containerd’s snapshotters.
  • We’ll be using containerd to push, pull, and store images.

What does this mean for Docker users?

We know developers love how Docker commands work today and that many tools rely on the existing Docker API. With this in mind, we’re fully vested in making sure that the integration is as transparent as possible and doesn’t break existing workflows. To do this, we’re first rolling it out as an experimental, opt-in feature so that we can get early feedback. When enabled in the latest Docker Desktop, this experimental feature lets you use the following Docker commands with containerd under the hood: run, commit, build, push, load, and save.

This integration has the following benefits:

  1. Containerd’s snapshotter implementation helps you quickly plug in new features. Some examples include using stargz to lazy-pull images on startup or nydus and dragonfly for peer-to-peer image distribution.
  2. The containerd content store can natively store multi-platform images and other OCI-compatible objects. This enables features like the ability to build and manipulate multi-platform images using Docker Engine (and possibly other content in the future!).

If you plan to build the multi-platform image, the below graphic shows what to expect when you run the build command with the containerd store enabled. 

A recording of a docker build terminal output with the containerd store enabled.

Without the experimental feature enabled, you will get an error message stating that this feature is not supported on docker driver as shown in the graphic below. 

A recording of an unsuccessful docker build output with the containerd store disabled.

If you decide not to enable the experimental feature, no big deal! Things will work like before. If you have additional questions, you can access details in our release notes.

Roadmap for the containerd integration

We want to be as transparent as possible with the Docker community when it comes to this containerd integration (no surprises here!). For this reason, we’ve laid out a roadmap. The integration will happen in two key steps:

  1. We’ll ship an initial version in Docker Desktop which enables common workflows but doesn’t touch existing images to prove that this approach works.
  2. Next, we’ll write the code to migrate user images to use containerd and activate the feature for all our users.

We work to make expanding integrations like this as seamless as possible so you, our end user, can reap the benefits! This way, you can create new, exciting things while leveraging existing features in the ecosystem such as namespaces, containerd plug-ins, and more.

We’ve released this experimental feature first in Docker Desktop so that we can get feedback quickly from the community. But, you can also expect this feature in a future Docker Engine release.  


The details on the ongoing integration work can be accessed here

Conclusion

In summary, Docker users can now look forward to full containerd integration. This brings many exciting features from native multi-platform support to encrypted images and lazy pulls. So make sure to download the latest version of Docker Desktop and enable the containerd experimental feature to take it for a spin! 

We love sharing things early and getting feedback from the Docker community — it helps us build products that work better for you. Please join us on our community Slack channel or drop us a line using our feedback form.

]]>
How to Use the Redis Docker Official Image https://www.docker.com/blog/how-to-use-the-redis-docker-official-image/ Wed, 24 Aug 2022 14:00:00 +0000 https://www.docker.com/?p=36720 Maintained in partnership with Redis, the Redis Docker Official Image (DOI) lets developers quickly and easily containerize a Redis instance. It streamlines the cross-platform deployment process — even letting you use Redis with edge devices if they support your workflows. 

Developers have pulled the Redis DOI over one billion times from Docker Hub. As the world’s most popular key-value store, Redis helps apps concurrently access critical bits of data while remaining resource friendly. It’s highly performant, in-memory, networked, and durable. It also stands apart from relational databases like MySQL and PostgreSQL that use tabular data structures. From day one, Redis has also been open source. 

Finally, Redis cluster nodes are horizontally scalable — making it a natural fit for containerization and multi-container operation. Read on as we explore how to use the Redis Docker Official Image to containerize and accelerate your Redis database deployment.

In this tutorial:

What is the Redis Docker Official Image?

redis docker official image

The Redis DOI is a building block for Redis Docker containers. It’s an executable software package that tells Docker and your application how to behave. It bundles together source code, dependencies, libraries, tools, and other core components that support your application. In this case, these components determine how your app and Redis database interact.

Our Redis Docker Official Image supports multiple CPU architectures. An assortment of over 50 supported tags lets you choose the best Redis image for your project. They’re also multi-layered and run using a default configuration (if you’re simply using docker pull). Complexity and base images also vary between tags. 

That said, you can also configure your Redis Official Image’s Dockerfile as needed. We’ll touch on this while outlining how to use the Redis DOI. Let’s get started.

How to run Redis in Docker

Before proceeding, we recommend installing Docker Desktop. Desktop is built upon Docker Engine and packages together the Docker CLI, Docker Compose, and more. Running Docker Desktop lets you use Docker commands. It also helps you manage images and containers using the Docker Dashboard UI. 

Use a quick pull command

Next, you’ll need to pull the Redis DOI to use it with your project. The quickest method involves visiting the image page on Docker Hub, copying the docker pull command, and running it in your terminal:

Your output confirms that Docker has successfully pulled the :latest Redis image. You can also verify this by hopping into Docker Desktop and opening the Images interface from the left sidebar. Your redis image automatically appears in the list:

Docker Desktop list of local images on disk, including the Redis official Docker image.

We can also see that our new Redis image is 111.14 MB in size. This is pretty lightweight compared to many images. However, using an alpine variant like redis:alpine3.16 further slims your image.

Now that you’re acquainted with Docker Desktop, let’s jump into our CLI workflow to get Redis up and running. 

Start your Redis instance

Redis acts as a server, and related server processes power its functionality. We need to start a Redis instance, or software server process, before linking it with our application. Luckily, you can create a running instance with just one command: 

 docker run --name some-redis -d redis 

We recommend naming your container. This helps you reference later on. It also makes it easier to run additional commands that involve it. Your container will run until you stop it. 

By adding -d redis in this command, Docker will run your Redis service in “detached” mode. Redis, therefore, runs in the background. Your container will also automatically exit when its root process exits. You’ll see that we’re not explicitly telling the service to “start” within this command. By leaving this verbiage out, our Redis service will start and continue running — remaining usable to our application.

Set up Redis persistent storage

Persistent storage is crucial when you want your application to save data between runs. You can have Redis write its data to a destination like an SSD. Persistence is also useful for keeping log files across restarts. 

You can capture every Redis operation using the Redis Database (RDB) method. This lets you designate snapshot intervals and record data at certain points in time. However, that running container from our initial docker run command is using port 6379. You should remove (or stop) this container before moving on, since it’s not critical for this example. 

Once that’s done, this command triggers persistent storage snapshots every 60 seconds: 

 docker run --name some-redis -d redis redis-server --save 60 1 --loglevel warning 

The RDB approach is valuable as it enables “set-and-forget” persistence. It also generates more logs. Logging can be useful for troubleshooting, yet it also requires you to monitor accumulation over time. 

However, you can also forego persistence entirely or choose another option. To learn more, check out Redis’ documentation

Redis stores your persisted data in the VOLUME /data location. These connected volumes are shareable between containers. This shareability becomes useful when Redis lives within one container and your application occupies another. 

Connect with the Redis CLI

The Redis CLI lets you run commands directly within your running Redis container. However, this isn’t automatically possible via Docker. Enter the following commands to enable this functionality: 

 docker network create some-network 
 ​​docker run -it --network some-network --rm redis redis-cli -h some-redis

Your Redis service understands Redis CLI commands. Numerous commands are supported, as are different CLI modes. Read through the Redis CLI documentation to learn more. 

Once you have CLI functionality up and running, you’re free to leverage Redis more directly!

Configurations and modules

Finally, we’ve arrived at customization. While you can run a Redis-powered app using defaults, you can tweak your Dockerfile to grab your pre-existing redis.conf file. This better supports production applications. While Redis can successfully start without these files, they’re central to configuring your services. 

You can see what a redis.conf file looks like on GitHub. Otherwise, here’s a sample Dockerfile

FROM redis
COPY redis.conf /usr/local/etc/redis/redis.conf
CMD [ "redis-server", "/usr/local/etc/redis/redis.conf" ]

You can also use docker run to achieve this. However, you should first do two things for this method to work correctly. First, create the /myredis/config directory on your host machine. This is where your configuration files will live. 

Second, open Docker Desktop and click the Settings gear in the upper right. Choose Resources > File Sharing to view your list of directories. You’ll see a grayed-out directory entry at the bottom, which is an input field for a named directory. Type in /myredis/config there and hit the “+” button to locally verify your file path:

Docker Desktop resource file sharing settings with the `/myredis/config` added.

You’re now ready to run your command! 

 docker run -v /myredis/conf:/usr/local/etc/redis --name myredis redis redis-server /usr/local/etc/redis/redis.conf 

The Dockerfile gives you more granular control over your image’s construction. Alternatively, the CLI option lets you run your Redis container without a Dockerfile. This may be more approachable if your needs are more basic. Just ensure that your mapped directory is writable and exists locally. 

Also, consider the following: 

  • If you edit your Redis configurations on the fly, you’ll have to use CONFIG REWRITE to automatically identify and apply any field changes on the next run.
  • You can also apply configuration changes manually.

Remember how we connected the Redis CLI earlier? You can now pass arguments directly through the Redis CLI (ideal for testing) and edit configs while your database server is running. 

Notes on using Redis modules

Redis modules let you extend your Redis service, and build new services, and adapt your database without taking a performance hit. Redis also processes them in memory. These standard modules support querying, search, JSON processing, filtering, and more. As a result, Docker Hub’s redislabs/redismod image bundles seven of these official modules together: 

  1. RedisBloom
  2. RedisTimeSeries
  3. RedisJSON
  4. RedisAI
  5. RedisGraph
  6. RedisGears
  7. Redisearch

If you’d like to spin up this container and experiment, simply enter docker run -d -p 6379:6379 redislabs/redismod in your terminal. You can open Docker Desktop to view this container like we did earlier on. 

You can view Redis’ curated modules or visit the Redis Modules Hub to explore further.

Get up and running with Redis today

We’ve explored how to successfully Dockerize Redis. Going further, it’s easy to grab external configurations and change how Redis operates on the fly. This makes it much easier to control how Redis interacts with your application. Head on over to Docker Hub and pull your first Redis Docker Official Image to start experimenting. 

The Redis Stack also helps extend Redis within Docker. It adds modern, developer-friendly data models and processing engines. The Stack also grants easy access to full-text search, document store, graphs, time series, and probabilistic data structures. Redis has published related container images through the Docker Verified Publisher (DVP) program. Check them out!

]]>
Docker images Archives | Docker nonadult
Announcing Docker SBOM: A step towards more visibility into Docker images https://www.docker.com/blog/announcing-docker-sbom-a-step-towards-more-visibility-into-docker-images/ Thu, 07 Apr 2022 15:00:26 +0000 https://www.docker.com/?p=33004 Today, Docker takes its first step in making what is inside your container images more visible so that you can better secure your software supply chain. Included in Docker Desktop 4.7.0 is a new, experimental docker sbom CLI command that displays the SBOM (Software Bill Of Materials) of any Docker image. It will also be included in our Linux packages in an upcoming release. The functionality was developed as an open source collaboration with Anchore using their Syft project.

As I wrote in my blog post last week, at Docker our priorities are performance, trust and great experiences. This work is focused on improving trust in the supply chain by making it easier to see what is in images and providing SBOMs to consumers of software, and improving the developer experience by making container images more transparent, so you can easily see what is inside of them. This command is just a first step that Docker is taking to make container images more self descriptive. We believe that the best time to determine and record what is in a container image is when you are putting the image together with docker build. To enable this, we are working on making it easy for partners and the community to add SBOM functionality to docker build using BuildKit’s extensibility.

As this information is generated at build time, we believe that it should be included as part of the image artifact. This means that if you move images between registries (or even into air gapped environments), you should still be able to read the SBOM and other image build metadata off of the image.

We’re looking to collaborate with partners and those in the community on our SBOM work in BuildKit. Take a look at our PoC and leave feedback here.

What is an SBOM?

A Software Bill Of Materials (SBOM) is analogous to a packing list for a shipment; it’s all the components that make up the software, or were used to build it. For container images, this includes the operating system packages that are installed (e.g.: ca-certificates) along with language specific packages that the software depends on (e.g.: log4j). The SBOM could include only some of this information or even more details, like the versions of components and where they came from.

SBOMs are sometimes required by governments or other software consumers who are trying to improve their supply chain security. This is because knowing what is inside your software gives you confidence that it is safe to use and can be useful in understanding impact when a vulnerability is made public.

Using the container image SBOM to check for a vulnerability

Let’s take a quick look at what the docker sbom command can do to help when a vulnerability like log4shell is made public. When a vulnerability like this appears, it’s crucial that you can quickly determine if your software is impacted. We’ll use the neo4j:4.4.5 Docker Official Image. Just running docker sbom neo4j:4.4.5 outputs a tabulated form of the SBOM:

$ docker sbom neo4j:4.4.5

Syft v0.42.2

 ✔ Loaded image            

 ✔ Parsed image            

 ✔ Cataloged packages      [385 packages]




NAME                      VERSION                        TYPE

... 

bsdutils                  1:2.36.1-8+deb11u1             deb

ca-certificates           20210119                       deb

...

log4j-api                 2.17.1                         java-archive  

log4j-core                2.17.1                         java-archive  

...

Note that the output includes not only the Debian packages that have been installed inside the image but also the Java libraries used by the application. Getting this information reliably and with minimal effort allows you to promptly respond and reduce the chance that you will be breached. In the above example, we can see that Neo4j uses version 2.17.1 of the log4j-core library which means that it is not affected by log4shell.

Without docker sbom or another SBOM scanning tool, you would need to check your application’s source code to see which version of log4j-core you are using. When you have several applications or services deployed and multiple versions of them, this can be difficult.

In addition to outputting the SBOM in a table, the docker sbom command has options for outputting SBOM in the standard SPDX and CycloneDX formats along with the GitHub and native Syft formats.

We are sharing the docker sbom functionality early, as an experimental command, with the intention of getting feedback from the community on the direction that we’re going. We’d like to know about your use cases and any other feedback that you have. You can leave it on the command’s repo.

What’s next?

We’d love to collaborate with partners and the community on bringing SBOMs to all container images through BuildKit so please hack on our example and leave feedback on our RFC. Please also give the experimental docker sbom command a try and leave us any feedback that you have. You can also read more about the docker sbom collaboration with Anchore on their blog.

DockerCon2022

Join us for DockerCon2022 on Tuesday, May 10. DockerCon is a free, one day virtual event that is a unique experience for developers and development teams who are building the next generation of modern applications. If you want to learn about how to go from code to cloud fast and how to solve your development challenges, DockerCon 2022 offers engaging live content to help you build, share and run your applications. Register today at https://www.docker.com/dockercon/

]]>
Advanced Image Management in Docker Hub https://www.docker.com/blog/advanced-image-management-in-docker-hub/ Tue, 23 Mar 2021 16:19:32 +0000 https://www.docker.com/blog/?p=27772 We are excited to announce the latest feature for Docker Pro and Team users, our new Advanced Image Management Dashboard available on Docker Hub. The new dashboard provides developers with a new level of access to all of the content you have stored in Docker Hub providing you with more fine grained control over removing old content and exploring old versions of pushed images. 

advanced image management

Historically in Docker Hub we have had visibility into the latest version of a tag that a user has pushed, but what has been very hard to see or even understand is what happened to all of those old things that you pushed. When you push an image to Docker Hub you are pushing a manifest, a list of all of the layers of your image, and the layers themselves.

When you are updating an existing tag, only the new layers will be pushed along with the new manifest which references these layers. This new manifest will be given the tag you specify when you push, such as bengotch/simplewhale:latest. But this does not mean that all of those old manifests which point at the previous layers that made up your image are removed from Hub. These are still here, there is just no way to easily see them or to manage that content. You can in fact still use and reference these using the digest of the manifest if you know it. You can kind of think of this like your commit history (the old digests) to a particular branch (your tag) of your repo (your image repo!). 

image

This means you can have hundreds of old versions of images which your systems can still be pulling by hash rather than by the tag and you may be unaware which old versions are still in use. Along with this the only way until now to remove these old versions was to delete the entire repo and start again!

With the release of the image management dashboard we have provided a new GUI with all of this information available to you including whether those currently ‘untagged old manifests’ are still ‘active’ (have been pulled in the last month) or whether they are inactive. This combined with the new bulk delete for these objects and current tags provides you a more powerful tool for batch managing your content in Docker Hub. 

To get started you will find a new banner on your repos page if you have inactive images:

This will tell you how many images you have, tagged or old, which have not been pushed or pulled to in the last month. By clicking view you can go through to the new Advanced Image Management Dashboard to check out all your content, from here you can see what the tags of certain manifests used to be and use the multi-selector option to bulk delete these. 

For a full product tour check out our overview video of the feature below.

We hope that you are excited for the first step of us providing greater insight into your content on Docker Hub, if you want to get started exploring your content then all users can see how many inactive images they have and Pro & Team users can see which tags these used to be associated with, what the hashes of these are and start removing these today. To find out more about becoming a Pro or Team user check out this page.

]]>
Advanced Image Management in Docker Hub nonadult