Ajeet Singh Raina – Docker

Creating AI-Enhanced Document Management with the GenAI Stack

Angel Borroy — Tue, 07 May 2024 14:39:51 +0000

Organizations must deal with countless reports, contracts, research papers, and other documents, but managing, deciphering, and extracting pertinent information from these documents can be challenging and time-consuming. In such scenarios, an AI-powered document management system can offer a transformative solution.

Developing Generative AI (GenAI) technologies with Docker offers endless possibilities not only for summarizing lengthy documents but also for categorizing them and generating detailed descriptions and even providing prompt insights you may have missed. This multi-faceted approach, powered by AI, changes the way organizations interact with textual data, saving both time and effort.

In this article, we’ll look at how to integrate Alfresco, a robust document management system, with the GenAI Stack to open up possibilities such as enhancing document analysis, automating content classification, transforming search capabilities, and more.

High-level architecture of Alfresco document management

Alfresco is an open source content management platform designed to help organizations manage, share, and collaborate on digital content and documents. It provides a range of features for document management, workflow automation, collaboration, and records management.

You can find the Alfresco Community platform on Docker Hub. The Docker image for the UI, named alfresco-content-app, has more than 10 million pulls, while other core platform services have more than 1 million pulls.

Alfresco Community platform (Figure 1) provides various open source technologies to create a Content Service Platform, including:

Alfresco content repository is the core of Alfresco and is responsible for storing and managing content. This component exposes a REST API to perform operations in the repository.
Database: PostgreSQL, among others, serves as the database management system, storing the metadata associated with a document.
Apache Solr: Enhancing search capabilities, Solr enables efficient content and metadata searches within Alfresco.
Apache ActiveMQ: As an open source message broker, ActiveMQ enables asynchronous communication between various Alfresco services. Its Messaging API handles asynchronous messages in the repository.
UI reference applications: Share and Alfresco Content App provide intuitive interfaces for user interaction and accessibility.

For detailed instructions on deploying Alfresco Community with Docker Compose, refer to the official Alfresco documentation.

Figure 1: Basic diagram for Alfresco Community deployment with Docker.

Why integrate Alfresco with the GenAI Stack?

Integrating Alfresco with the GenAI Stack unlocks a powerful suite of GenAI services, significantly enhancing document management capabilities. Enhancing Alfresco document management with the GenAI stack services has different benefits:

Use different deployments according to resources available: Docker allows you to easily switch between different Large Language Models (LLMs) of different sizes. Additionally, if you have access to GPUs, you can deploy a container with a GPU-accelerated LLM for faster inference. Conversely, if GPU resources are limited or unavailable, you can deploy a container with a CPU-based LLM.
Portability: Docker containers encapsulate the GenAI service, its dependencies, and runtime environment, ensuring consistent behavior across different environments. This portability allows you to develop and test the AI model locally and then deploy it seamlessly to various platforms.
Production-ready: The stack provides support for GPU-accelerated computing, making it well suited for deploying GenAI models in production environments. Docker’s declarative approach to deployment allows you to define the desired state of the system and let Docker handle the deployment details, ensuring consistency and reliability.
Integration with applications: Docker facilitates integration between GenAI services and other applications deployed as containers. You can deploy multiple containers within the same Docker environment and orchestrate communication between them using Docker networking. This integration enables you to build complex systems composed of microservices, where GenAI capabilities can be easily integrated into larger applications or workflows.

How does it work?

Alfresco provides two main APIs for integration purposes: the Alfresco REST API and the Alfresco Messaging API (Figure 2).

The Alfresco REST API provides a set of endpoints that allow developers to interact with Alfresco content management functionalities over HTTP. It enables operations such as creating, reading, updating, and deleting documents, folders, users, groups, and permissions within Alfresco.
The Alfresco Messaging API provides a messaging infrastructure for asynchronous communication built on top of Apache ActiveMQ and follows the publish-subscribe messaging pattern. Integration with the Messaging API allows developers to build event-driven applications and workflows that respond dynamically to changes and updates within the Alfresco Repository.

The Alfresco Repository can be updated with the enrichment data provided by GenAI Service using both APIs:

The Alfresco REST API may retrieve metadata and content from existing repository nodes to be sent to GenAI Service, and update back the node.
The Alfresco Messaging API may be used to consume new and updated nodes in the repository and obtain the result from the GenAI Service.

Figure 2: Alfresco provides two main APIs for integration purposes: the Alfresco REST API and the Alfresco Messaging API.

Technically, Docker deployment includes both the Alfresco and GenAI Stack platforms running over the same Docker network (Figure 3).

The GenAI Stack works as a REST API service with endpoints available in genai:8506, whereas Alfresco uses a REST API client (named alfresco-ai-applier) and a Messages API client (named alfresco-ai-listener) to integrate with AI services. Both clients can also be run as containers.

Figure 3: Deployment architecture for Alfresco integration with GenAI Stack services.

The GenAI Stack service provides the following endpoints:

summary: Returns a summary of a document together with several tags. It allows some customization, like the language of the response, the number of words in the summary and the number of tags.
classify: Returns a term from a list that best matches the document. It requires a list of terms as input in addition to the document to be classified.
prompt: Replies to a custom user prompt using retrieval-augmented generation (RAG) for the document to limit the scope of the response.
describe: Returns a text description for an input picture.

The implementation of GenAI Stack services loads the document text into chunks in Neo4j VectorDB to improve QA chains with embeddings and prevent hallucinations in the response. Pictures are processed using an LLM with a visual encoder (LlaVA) to generate descriptions (Figure 4). Note that Docker GenAI Stack allows for the use of multiple LLMs for different goals.

Figure 4: The GenAI Stack services are implemented using RAG and an LLM with visual encoder (LlaVA) for describing pictures.

Getting started

To get started, check the following:

Ensure that you have installed the latest version of Docker Desktop with 20 GB of RAM allocated.
Ensure that you have installed Java 17 and Maven 3.9.
Ensure that you have Ollama running locally.

Obtaining the amount of RAM available for Docker Desktop can be done using following command:

docker info --format '{{json .MemTotal}}'

If the result is under 20 GiB, follow the instructions in Docker official documentation for your operating system to boost the memory limit for Docker Desktop.

Clone the repository

Use the following command to close the repository:

git clone https://github.com/aborroy/alfresco-genai.git

The project includes the following components:

genai-stack folder is using https://github.com/docker/genai-stack project to build a REST endpoint that provides AI services for a given document.
alfresco folder includes a Docker Compose template to deploy Alfresco Community 23.1.
alfresco-ai folder includes a set of projects related to Alfresco integration.
- alfresco-ai-model defines a custom Alfresco content model to store summaries, terms and prompts to be deployed in Alfresco Repository and Share App.
- alfresco-ai-applier uses the Alfresco REST API to apply summaries or terms for a populated Alfresco Repository.
- alfresco-ai-listener listens to messages and generates summaries for created or updated nodes in Alfresco Repository.
compose.yaml file describes a deployment for Alfresco and GenAI Stack services using include directive.

Starting Docker GenAI service

The Docker GenAI Service for Alfresco, located in the genai-stack folder, is based on the Docker GenAI Stack project, and provides the summarization service as a REST endpoint to be consumed from Alfresco integration.

cd genai-stack

Before running the service, modify the .env file to adjust available preferences:

# Choose any of the on premise models supported by ollama
LLM=mistral
LLM_VISION=llava
# Any language name supported by chosen LLM
SUMMARY_LANGUAGE=English
# Number of words for the summary
SUMMARY_SIZE=120
# Number of tags to be identified with the summary
TAGS_NUMBER=3

Start the Docker Stack using the standard command:

docker compose up --build --force-recreate

After the service is up and ready, the summary REST endpoint becomes accessible. You can test its functionality using a curl command.

Use a local PDF file (file.pdf in the following sample) to obtain a summary and a number of tags.

curl --location 'http://localhost:8506/summary' \
--form 'file=@"./file.pdf"'
{ 
  "summary": " The text discusses...", 
  "tags": " Golang, Merkle, Difficulty", 
  "model": "mistral"
}

Use a local PDF file (file.pdf in the following sample) and a list of terms (such as Japanese or Spanish) to obtain a classification of the document.

curl --location \
'http://localhost:8506/classify?termList=%22Japanese%2CSpanish%22' \
--form 'file=@"./file.pdf"'
{
    "term": " Japanese",
    "model": "mistral"
}

Use a local PDF file (file.pdf in the following sample) and a prompt (such as “What is the name of the son?”) to obtain a response regarding the document.

curl --location \
'http://localhost:8506/prompt?prompt=%22What%20is%20the%20name%20of%20the%20son%3F%22' \
--form 'file=@"./file.pdf"'
{
    "answer": " The name of the son is Musuko.",
    "model": "mistral"
}

Use a local picture file (picture.jpg in the following sample) to obtain a text description of the image.

curl --location 'http://localhost:8506/describe' \
--form 'image=@"./picture.jpg"'
{
    "description": " The image features a man standing... ",
    "model": "llava"
}

Note that, in this case, LlaVA LLM is used instead of Mistral.

Make sure to stop Docker Compose before continuing to the next step.

Starting Alfresco

The Alfresco Platform, located in the alfresco folder, provides a sample deployment of the Alfresco Repository including a customized content model to store results obtained from the integration with the GenAI Service.

Because we want to run both Alfresco and GenAI together, we’ll use the compose.yaml file located in the project’s main folder.

include:
  - genai-stack/compose.yaml
  - alfresco/compose.yaml
#  - alfresco/compose-ai.yaml

In this step, we’re deploying only GenAI Stack and Alfresco, so make sure to leave the compose.ai.yaml line commented out.

Start the stack using the standard command:

docker compose up --build --force-recreate

After the service is up and ready, the Alfresco Repository becomes accessible. You can test the platform using default credentials (admin/admin) in the following URLs:

Repository services: http://localhost:8080/alfresco
UI: http://localhost:8080/share

Enhancing existing documents within Alfresco

The AI Applier application, located in the alfresco-ai/alfresco-ai-applier folder, contains a Spring Boot application that retrieves documents stored in an Alfresco folder, obtains the response from the GenAI Service and updates the original document in Alfresco.

Before running the application for the first time, you’ll need to build the source code using Maven.

cd alfresco-ai/alfresco-ai-applier
mvn clean package

As we have GenAI Service and Alfresco Platform up and running from the previous steps, we can upload documents to the Alfresco Shared Files/summary folder and run the program to update the documents with the summary.

java -jar target/alfresco-ai-applier-0.8.0.jar \
--applier.root.folder=/app:company_home/app:shared/cm:summary \
--applier.action=SUMMARY
...
Processing 2 documents of a total of 2
END: All documents have been processed. The app may need to be executed again for nodes without existing PDF rendition.

Once the process has been completed, every Alfresco document in the Shared Files/summary folder will include the information obtained by the GenAI Stack service: summary, tags, and LLM used (Figure 5).

Figure 5: The document has been updated in Alfresco Repository with summary, tags and model (LLM).

You can now upload documents to the Alfresco Shared Files/classify folder to prepare the repository for the next step.

Classifying action can be applied to documents in the Alfresco Shared Files/classify folder using the following command. GenAI Service will pick the term from the list (English, Spanish, Japanese) that best matches each document in the folder.

java -jar target/alfresco-ai-applier-0.8.0.jar \
--applier.root.folder=/app:company_home/app:shared/cm:classify \
--applier.action=CLASSIFY \
--applier.action.classify.term.list=English,Spanish,Japanese
...
Processing 2 documents of a total of 2
END: All documents have been processed. The app may need to be executed again for nodes without existing PDF rendition.

Upon completion, every Alfresco document in the Shared Files folder will include the information obtained by the GenAI Stack service: a term from the list of terms and the LLM used (Figure 6).

Figure 6: The document has been updated in Alfresco Repository with term and model (LLM).

You can upload pictures to the Alfresco Shared Files/picture folder to prepare the repository for the next step.

To obtain a text description from pictures, create a new folder named picture under the Shared Files folder. Upload any image file to this folder and run the following command:

java -jar target/alfresco-ai-applier-0.8.0.jar \
--applier.root.folder=/app:company_home/app:shared/cm:picture \
--applier.action=DESCRIBE
...
Processing 1 documents of a total of 1
END: All documents have been processed. The app may need to be executed again for nodes without existing PDF rendition.

Following this process, every Alfresco image in the picture folder will include the information obtained by the GenAI Stack service: a text description and the LLM used (Figure 7).

Figure 7: The document has been updated in Alfresco Repository with text description and model (LLM).

Enhancing new documents uploaded to Alfresco

The AI Listener application, located in the alfresco-ai/alfresco-ai-listener folder, contains a Spring Boot application that listens to Alfresco messages, obtains the response from the GenAI Service and updates the original document in Alfresco.

Before running the application for the first time, you’ll need to build the source code using Maven and to build the Docker image.

cd alfresco-ai/alfresco-ai-listener
mvn clean package
docker build . -t alfresco-ai-listener

As we are using the AI Listener application as a container, stop the Alfresco deployment and uncomment the alfresco-ai-listener in the compose.yaml file.

include:
  - genai-stack/compose.yaml
  - alfresco/compose.yaml
  - alfresco/compose-ai.yaml

Start the stack using the standard command:

docker compose up --build --force-recreate

After the service is again up and ready, the Alfresco Repository becomes accessible. You can verify that the platform is working by using default credentials (admin/admin) in the following URLs:

Repository services: http://localhost:8080/alfresco
UI: http://localhost:8080/share

Summarization

Next, upload a new document and apply the “Summarizable with AI” aspect to the document. After a while, the document will include the information obtained by the GenAI Stack service: summary, tags, and LLM used.

Description

If you want to use AI enhancement, you might want to set up a folder that automatically applies the necessary aspect, instead of doing it manually.

Create a new folder named pictures in Alfresco Repository and create a rule with the following settings in it:

Name: description
When: Items are created or enter this folder
If all criteria are met: All Items
Perform Action: Add “Descriptable with AI” aspect

Upload a new picture to this folder. After a while, without manual setting of the aspect, the document will include the information obtained by the GenAI Stack service: description and LLM used.

Classification

Create a new folder named classifiable in Alfresco Repository. Apply the “Classifiable with AI” aspect to this folder and add a list of terms separated by comma in the “Terms” property (such as English, Japanese, Spanish).

Create a new rule for classifiable folder with the following settings:

Name: classifiable
When: Items are created or enter this folder
If all criteria are met: All Items
Perform Action: Add “Classified with AI” aspect

Upload a new document to this folder. After a while, the document will include the information obtained by the GenAI Stack service: term and LLM used.

A degree of automation can be achieved when using classification with AI. To do this, a simple Alfresco Repository script named classify.js needs to be created in the folder “Repository/Data Dictionary/Scripts” with following content.

document.move(
  document.parent.childByNamePath(    
    document.properties["genai:term"]));

Create a new rule for classifiable folder to apply this script with following settings:

Name: move
When: Items are updated
If all criteria are met: All Items
Perform Action: Execute classify.js script

Create a child folder of the classifiable folder with the name of every term defined in the “Terms” property.

When you set up this configuration, any documents uploaded to the folder will automatically be moved to a subfolder based on the identified term. This means that the documents are classified automatically.

Prompting

Finally, to use the prompting GenAI feature, apply the “Promptable with AI” aspect to an existing document. Type your question in the “Question” property.

After a while, the document will include the information obtained by the GenAI Stack service: answer and LLM used.

A new era of document management

By embracing this framework, you can not only unlock a new level of efficiency, productivity, and user experience but also lay the foundation for limitless innovation. With Alfresco and GenAI Stack, the possibilities are endless — from enhancing document analysis and automating content classification to revolutionizing search capabilities and beyond.

If you’re unsure about any part of this process, check out the following video, which demonstrates all the steps live:

Learn more

Read Introducing a New GenAI Stack: Streamlined AI/ML Integration Made Easy.
Subscribe to the Docker Newsletter.
Get the latest release of Docker Desktop.
Vote on what’s next! Check out our public roadmap.
Have questions? The Docker community is here to help.
New to Docker? Get started.

Better Debugging: How the Signal0ne Docker Extension Uses AI to Simplify Container Troubleshooting

Szymon Stawski — Wed, 24 Apr 2024 15:58:35 +0000

This post was written in collaboration with Szymon Stawski, project maintainer at Signal0ne.

Consider this scenario: You fire up your Docker containers, hit an API endpoint, and … bam! It fails. Now what? The usual drill involves diving into container logs, scrolling through them to understand the error messages, and spending time looking for clues that will help you understand what’s wrong. But what if you could get a summary of what’s happening in your containers and potential issues with the proposed solutions already provided?

In this article, we’ll dive into a solution that solves this issue using AI. AI can already help developers write code, so why not help developers understand their system, too?

Signal0ne is a Docker Desktop extension that scans Docker containers’ state and logs in search of problems, analyzes the discovered issues, and outputs insights to help developers debug. We first learned about Signal0ne as the winning submission in the 2023 Docker AI/ML Hackathon, and we’re excited to show you how to use it to debug more efficiently.

Introducing Signal0ne Docker extension: Streamlined debugging for Docker

The magic of the Signal0ne Docker extension is its ability to shorten feedback loops for working with and developing containerized applications. Forget endless log diving — the extension offers a clear and concise summary of what’s happening inside your containers after logs and states are analyzed by an AI agent, pinpointing potential issues and even suggesting solutions.

Developing applications these days involves more than a block of code executed in a vacuum. It is a complex system of dependencies, and different user flows that need debugging from time to time. AI can help filter out all the system noise and focuses on providing data about certain issues in the system so that developers can debug faster and better.

Docker Desktop is one of the most popular tools used for local development with a huge community, and Docker features like Docker Debug enhance the community’s ability to quickly debug and resolve issues with their containerized apps.

Signal0ne Docker extension’s suggested solutions and summaries can help you while debugging your container or editing your code so that you can focus on bringing value as a software engineer. The term “developer experience” is often used, but this extension focuses on one crucial aspect: shortening development time. This translates directly to increased productivity, letting you build containerized applications faster and more efficiently.

How does the Docker Desktop extension work?

Between AI co-pilots, highly integrated in IDEs that help write code, and browser AI chats that help understand software development concepts in a Q&A way, there is one piece missing: logs and runtime system data.

The Signal0ne Docker Desktop extension consists of three components: two hosted on the user’s local system (UI and agent) and one in the Signal0ne cloud backend service. The agent scans the user’s local environment in the search of containers with invalid states, runtime issues, or some warnings or errors in the logs, after issue discovery, it collects additional data from container definition for enhanced analysis.

After the Signal0ne agent discovery, data is sent to the backend service, where a combination of pre-trained LLM and solution search retrieval service performs the analysis. The analysis of the issues can be seen from the Signal0ne extension UI, including:

Short log summary — Outlines what is happening within a particular container, including logs on which analysis was based can be accessed from the sources dropdown if you wish.
Solutions summary — One paragraph about possible solutions to the analyzed issue with some explanation of why this particular solution was proposed.
Solution sources — Links to websites where the issue has already been solved by the community (for example, a GitHub issue or StackOverflow post).

Watch this short video demonstration for more details:

Technical details

The issue analysis pipeline is powered by the LangChain tooling framework, which utilizes open source Mistral 7b LLM with LLM-optimized web search that browses sources of existing solutions on the open web (Figure 1).

Figure 1: Signal0ne architecture overview.

The following section shows sample issue analysis pipeline output:

Logs from Docker container:

“$ irb\n>> \"12341:asdf\"[/\\d+/]\n# => \"12341\"  \n>> \"12341:asdf\"[/\\d*/]\n# => \"12341\"  \n>> \"12341:asdf\"[0..5]\n# => \"12341:\"  \n>> \"12341:asdf\"[0...5]\n# => \"12341\"  \n>> \"12341:asdf\"[0, ':']\nTypeError: can't convert String into Integer\n  from (irb):5:in”

Summary:

The user is trying to extract a substring from the string 12341:asdf using different Ruby methods, such as [/\\\\d+/], [/\\\\d\\\\*/], [0..5], [0...5], and [0, ':']. However, when attempting to use the indexing method [0, ':'], an error occurs because the method expects an integer argument but receives a string instead.

Predicted solutions:

“To extract a substring from the given string 12341:asdf without encountering an error when using indexing methods, consider the following options:

1. Using regular expressions:

To extract a substring using regular expressions, you can use the scan method with a regular expression pattern. For instance, to extract the digits before the colon, you can use the following code snippet:

				str = "12341:asdf"                           
				substring = str.scan(/[\\\\d]+/)                          
				puts substring.first
				# Output: ["12341"]                         
				puts substring.first[0]                           
				# Output: 12341

2. Slicing:

Another alternative to indexing is slicing, which allows you to extract a specified range of characters from a string. To extract the digits before the colon using slicing, you can use the following code snippet:

str = "12341:asdf"                       
	   substring = str[0...str.index(":")]                         
               puts substring               
	   # Output: "12341"                         
	   puts substring.to_i                           
	   # Output: 12341

These two methods should help you extract the desired substring from the given string without encountering an error.

Getting started

Getting started with the Signal0ne Docker Desktop extension is a straightforward process that allows developers to leverage the benefits of unified development.

Here are the steps for installing Signal0ne Docker extension:

1. Install Docker Desktop.

2. Choose Add Extensions in the left sidebar. The Browse tab will appear by default (Figure 2).

Figure 2: Signal0ne extension installation from the marketplace.

3. In the Filters drop-down, select the Utility tools category.

4. Find Signal0ne and then select Install (Figure 3).

Figure 3: Extension installation process.

5. Log in after the extension is installed (Figure 4).

Figure 4: Signal0ne extension login screen.

6. Start developing your apps, and, if you face some issues while debugging, have a look at the Signal0ne extension UI. The issue analysis will be there to help you with debugging.

Make sure the Signal0ne agent is enabled by toggling on (Figure 5):

Figure 5: Agent settings tab.

Figure 6 shows the summary and sources:

Figure 6: Overview of the inspected issue.

Proposed solutions and sources are shown in Figures 7 and 8. Solutions sources will redirect you to a webpage with predicted solution:

Figure 7: Overview of proposed solutions to the encountered issue.

Figure 8: Overview of the list of helpful links.

If you want to contribute to the project, you can leave feedback via the Like or Dislike button in the issue analysis output (Figure 9).

Figure 9: You can leave feedback about analysis output for further improvements.

To explore Signal0ne Docker Desktop extension without utilizing your containers, consider experimenting with dummy containers using this docker compose to observe how logs are being analyzed and how helpful the output is with the insights:

services:
  broken_bulb: # c# application that cannot start properly
    image: 'Signal0neai/broken_bulb:dev'
  faulty_roger: # 
    image: 'Signal0neai/faulty_roger:dev'
  smoked_server: # nginx server hosting the website with the miss-configuration
    image: 'Signal0neai/smoked_server:dev'
    ports:
      - '8082:8082'
  invalid_api_call: # python webserver with bug 
   image: 'Signal0neai/invalid_api_call:dev'
   ports:
    - '5000:5000'

broken_bulb: This service uses the image Signal0neai/broken_bulb:dev. It’s a C# application that throws System.NullReferenceException during the startup. Thanks to that application, you can observe how Signal0ne discovers the failed container, extracts the error logs, and analyzes it.
faulty_roger: This service uses the image Signal0neai/faulty_roger:dev. It is a Python API server that is trying to connect to an unreachable database on localhost.
smoked_server: This service utilizes the image Signal0neai/smoked_server:dev. The smoked_server service is an Nginx instance that is throwing 403 forbidden while the user is trying to access the root path (http://127.0.0.1:8082/). Signal0ne can help you debug that.
invalid_api_call: API service with a bug in one of the endpoints, to generate an error call http://127.0.0.1:5000/create-table after running the container. Follow the analysis of Signal0ne and try to debug the issue.

Conclusion

Debugging containerized applications can be time-consuming and tedious, often involving endless scrolling through logs and searching for clues to understand the issue. However, with the introduction of the Signal0ne Docker extension, developers can now streamline this process and boost their productivity significantly.

By leveraging the power of AI and language models, the extension provides clear and concise summaries of what’s happening inside your containers, pinpoints potential issues, and even suggests solutions. With its user-friendly interface and seamless integration with Docker Desktop, the Signal0ne Docker extension is set to transform how developers debug and develop containerized applications.

Whether you’re a seasoned Docker user or just starting your journey with containerized development, this extension offers a valuable tool that can save you countless hours of debugging and help you focus on what matters most — building high-quality applications efficiently. Try the extension in Docker Desktop today, and check out the documentation on GitHub.

Learn more

Subscribe to the Docker Newsletter.
Get the latest release of Docker Desktop.
Vote on what’s next! Check out our public roadmap.
Have questions? The Docker community is here to help.
New to Docker? Get started.

Building a Video Analysis and Transcription Chatbot with the GenAI Stack

David Cardozo — Thu, 28 Mar 2024 14:32:22 +0000

Videos are full of valuable information, but tools are often needed to help find it. From educational institutions seeking to analyze lectures and tutorials to businesses aiming to understand customer sentiment in video reviews, transcribing and understanding video content is crucial for informed decision-making and innovation. Recently, advancements in AI/ML technologies have made this task more accessible than ever.

Developing GenAI technologies with Docker opens up endless possibilities for unlocking insights from video content. By leveraging transcription, embeddings, and large language models (LLMs), organizations can gain deeper understanding and make informed decisions using diverse and raw data such as videos.

In this article, we’ll dive into a video transcription and chat project that leverages the GenAI Stack, along with seamless integration provided by Docker, to streamline video content processing and understanding.

High-level architecture

The application’s architecture is designed to facilitate efficient processing and analysis of video content, leveraging cutting-edge AI technologies and containerization for scalability and flexibility. Figure 1 shows an overview of the architecture, which uses Pinecone to store and retrieve the embeddings of video transcriptions.

Figure 1: Schematic diagram outlining a two-component system for processing and interacting with video data.

The application’s high-level service architecture includes the following:

yt-whisper: A local service, run by Docker Compose, that interacts with the remote OpenAI and Pinecone services. Whisper is an automatic speech recognition (ASR) system developed by OpenAI, representing a significant milestone in AI-driven speech processing. Trained on an extensive dataset of 680,000 hours of multilingual and multitask supervised data sourced from the web, Whisper demonstrates remarkable robustness and accuracy in English speech recognition.
Dockerbot: A local service, run by Docker Compose, that interacts with the remote OpenAI and Pinecone services. The service takes the question of a user, computes a corresponding embedding, and then finds the most relevant transcriptions in the video knowledge database. The transcriptions are then presented to an LLM, which takes the transcriptions and the question and tries to provide an answer based on this information.
OpenAI: The OpenAI API provides an LLM service, which is known for its cutting-edge AI and machine learning technologies. In this application, OpenAI’s technology is used to generate transcriptions from audio (using the Whisper model) and to create embeddings for text data, as well as to generate responses to user queries (using GPT and chat completions).
Pinecone: A vector database service optimized for similarity search, used for building and deploying large-scale vector search applications. In this application, Pinecone is employed to store and retrieve the embeddings of video transcriptions, enabling efficient and relevant search functionality within the application based on user queries.

Getting started

To get started, complete the following steps:

Create an OpenAI API Key.
Ensure that you have a Pinecone API Key.
Ensure that you have installed the latest version of Docker Desktop.

The application is a chatbot that can answer questions from a video. Additionally, it provides timestamps from the video that can help you find the sources used to answer your question.

Clone the repository

The next step is to clone the repository:

git clone https://github.com/dockersamples/docker-genai.git

The project contains the following directories and files:

├── docker-genai/
│ ├── docker-bot/
│ ├── yt-whisper/
│ ├── .env.example
│ ├── .gitignore
│ ├── LICENSE
│ ├── README.md
│ └── docker-compose.yaml

Specify your API keys

In the /docker-genai directory, create a text file called .env, and specify your API keys inside. The following snippet shows the contents of the .env.example file that you can refer to as an example.

#-------------------------------------------------------------
# OpenAI
#-------------------------------------------------------------
OPENAI_TOKEN=your-api-key # Replace your-api-key with your personal API key

#-------------------------------------------------------------
# Pinecone
#--------------------------------------------------------------
PINECONE_TOKEN=your-api-key # Replace your-api-key with your personal API key

Build and run the application

In a terminal, change directory to your docker-genai directory and run the following command:

docker compose up --build

Next, Docker Compose builds and runs the application based on the services defined in the docker-compose.yaml file. When the application is running, you’ll see the logs of two services in the terminal.

In the logs, you’ll see the services are exposed on ports 8503 and 8504. The two services are complementary to each other.

The yt-whisper service is running on port 8503. This service feeds the Pinecone database with videos that you want to archive in your knowledge database. The next section explores the yt-whisper service.

Using yt-whisper

The yt-whisper service is a YouTube video processing service that uses the OpenAI Whisper model to generate transcriptions of videos and stores them in a Pinecone database. The following steps outline how to use the service.

Open a browser and access the yt-whisper service at http://localhost:8503. Once the application appears, specify a YouTube video URL in the URL field and select Submit. The example shown in Figure 2 uses a video from David Cardozo.

Figure 2: A web interface showcasing processed video content with a feature to download transcriptions.

Submitting a video

The yt-whisper service downloads the audio of the video, then uses Whisper to transcribe it into a WebVTT (*.vtt) format (which you can download). Next, it uses the “text-embedding-3-small” model to create embeddings and finally uploads those embeddings into the Pinecone database.

After the video is processed, a video list appears in the web app that informs you which videos have been indexed in Pinecone. It also provides a button to download the transcript.

Accessing Dockerbot chat service

You can now access the Dockerbot chat service on port 8504 and ask questions about the videos as shown in Figure 3.

Figure 3: Example of a user asking Dockerbot about NVIDIA containers and the application giving a response with links to specific timestamps in the video.

Conclusion

In this article, we explored the exciting potential of GenAI technologies combined with Docker for unlocking valuable insights from video content. It shows how the integration of cutting-edge AI models like Whisper, coupled with efficient database solutions like Pinecone, empowers organizations to transform raw video data into actionable knowledge.

Whether you’re an experienced developer or just starting to explore the world of AI, the provided resources and code make it simple to embark on your own video-understanding projects.

Learn more

Accelerated AI/ML with Docker
Build and run natural language processing (NLP) applications with Docker
Video transcription and chat using GenAI Stack
PDF analysis and chat using GenAI Stack
Subscribe to the Docker Newsletter.
Have questions? The Docker community is here to help.

Build Multimodal GenAI Apps with OctoAI and Docker

Thierry Moreau — Thu, 08 Feb 2024 15:07:55 +0000

This post was contributed by Thierry Moreau, co-founder and head of DevRel at OctoAI.

Generative AI models have shown immense potential over the past year with breakthrough models like GPT3.5, DALL-E, and more. In particular, open source foundational models have gained traction among developers and enterprise users who appreciate how customizable, cost-effective, and transparent these models are compared to closed-source alternatives.

In this article, we’ll explore how you can compose an open source foundational model into a streamlined image transformation pipeline that lets you manipulate images with nothing but text to achieve surprisingly good results.

With this approach, you can create fun versions of corporate logos, bring your kids’ drawings to life, enrich your product photography, or even remodel your living room (Figure 1).

Figure 1: Examples of image transformation including, from left to right: Generating creative corporate logo, bringing children’s drawings to life, enriching commercial photography, remodeling your living room

Pretty cool, right? Behind the scenes, a lot needs to happen, and we’ll walk step by step through how to reproduce these results yourself. We call the multimodal GenAI pipeline OctoShop as a nod to the popular image editing software.

Feeling inspired to string together some foundational GenAI models? Let’s dive into the technology that makes this possible.

Architecture overview

Let’s look more closely at the open source foundational GenAI models that compose the multimodal pipeline we’re about to build.

Going forward, we’ll use the term “model cocktail” instead of “multimodal GenAI model pipeline,” as it flows a bit better (and sounds tastier, too). A model cocktail is a mix of GenAI models that can process and generate data across multiple modalities: text and images are examples of data modalities across which GenAI models consume and produce data, but the concept can also extend to audio and video (Figure 2).

To build on the analogy of crafting a cocktail (or mocktail, if you prefer), you’ll need to mix ingredients, which, when assembled, are greater than the sum of their individual parts.

Figure 2: The multimodal GenAI workflow — by taking an image and text, this pipeline transforms the input image according to the text prompt.

Let’s use a Negroni, for example — my favorite cocktail. It’s easy to prepare; you need equal parts of gin, vermouth, and Campari. Similarly, our OctoShop model cocktail will use three ingredients: an equal mix of image-generation (SDXL), text-generation (Mistral-7B), and a custom image-to-text generation (CLIP Interrogator) model.

The process is as follows:

CLIP Interrogator takes in an image and generates a textual description (e.g., “a whale with a container on its back”).
An LLM model, Mistral-7B, will generate a richer textual description based on a user prompt (e.g., “set the image into space”). The LLM will consequently transform the description into a richer one that meets the user prompt (e.g., “in the vast expanse of space, a majestic whale carries a container on its back”).
Finally, an SDXL model will be used to generate a final AI-generated image based on the textual description transformed by the LLM model. We also take advantage of SDXL styles and a ControlNet to better control the output of the image in terms of style and framing/perspective.

Prerequisites

Let’s go over the prerequisites for crafting our cocktail.

Here’s what you’ll need:

Sign up for an OctoAI account to use OctoAI’s image generation (SDXL), text generation (Mistral-7B), and compute solutions (CLIP Interrogator) — OctoAI serves as the bar from which to get all of the ingredients you’ll need to craft your model cocktail. If you’re already using a different compute service, feel free to bring that instead.
Run a Jupyter notebook to craft the right mix of GenAI models. This is your place for experimenting and mixing, so this will be your cocktail shaker. To make it easy to run and distribute the notebook, we’ll use Google Colab.
Finally, we’ll deploy our model cocktail as a Streamlit app. Think of building your app and embellishing the frontend as the presentation of your cocktail (e.g., glass, ice, and choice of garnish) to enhance your senses.

Getting started with OctoAI

Head to octoai.cloud and create an account if you haven’t done so already. You’ll receive $10 in credits upon signing up for the first time, which should be sufficient for you to experiment with your own workflow here.

Follow the instructions on the Getting Started page to obtain an OctoAI API token — this will help you get authenticated whenever you use the OctoAI APIs.

Notebook walkthrough

We’ve built a Jupyter notebook in Colab to help you learn how to use the different models that will constitute your model cocktail. Here are the steps to follow:

1. Launch the notebook

Get started by launching the following Colab notebook.

There’s no need to change the runtime type or rely on a GPU or TPU accelerator — all we need is a CPU here, given that all of the AI heavy-lifting is done on OctoAI endpoints.

2. OctoAI SDK setup

Let’s get started by installing the OctoAI SDK. You’ll use the SDK to invoke the different open source foundational models we’re using, like SDXL and Mistral-7B. You can install through pip:

# Install the OctoAI SDK
!pip install octoai-sdk

In some cases, you may get a message about pip packages being previously imported in the runtime, causing an error. If that’s the case, selecting the Restart Session button at the bottom should take care of the package versioning issues. After this, you should be able to re-run the cell that pip-installs the OctoAI SDK without any issues.

3. Generate images with SDXL

You’ll first learn to generate an image with SDXL using the Image Generation solution API. To learn more about what each parameter does in the code below, check out OctoAI’s ImageGenerator client.

In particular, the ImageGenerator API takes several arguments to generate an image:

Engine: Lets you choose between versions of Stable Diffusion models, such as SDXL, SD1.5, and SSD.
Prompt: Describes the image you want to generate.
Negative prompt: Describes the traits you want to avoid in the final image.
Width, height: The resolution of the output image.
Num images: The number of images to generate at once.
Sampler: Determines the sampling method used to denoise your image. If you’re not familiar with this process, this article provides a comprehensive overview.
Number of steps: Number of denoising steps — the more steps, the higher the quality, but generally going past 30 will lead to diminishing returns.
Cfg scale: How closely to adhere to the image description — generally stays around 7-12.
Use refiner: Whether to apply the SDXL refiner model, which improves the output quality of the image.
Seed: A parameter that lets you control the reproducibility of image generation (set to a positive value to always get the same image given stable input parameters).

Note that tweaking the image generation parameters — like number of steps, number of images, sampler used, etc. — affects the amount of GPU compute needed to generate an image. Increasing GPU cycles will affect the pricing of generating the image.

Here’s an example using simple parameters:

# To use OctoAI, we'll need to set up OctoAI to use it
from octoai.clients.image_gen import Engine, ImageGenerator


# Now let's use the OctoAI Image Generation API to generate
# an image of a whale with a container on its back to recreate
# the moby logo
image_gen = ImageGenerator(token=OCTOAI_API_TOKEN)
image_gen_response = image_gen.generate(
 engine=Engine.SDXL,
 prompt="a whale with a container on its back",
 negative_prompt="blurry photo, distortion, low-res, poor quality",
 width=1024,
 height=1024,
 num_images=1,
 sampler="DPM_PLUS_PLUS_2M_KARRAS",
 steps=20,
 cfg_scale=7.5,
 use_refiner=True,
 seed=1
)
images = image_gen_response.images


# Display generated image from OctoAI
for i, image in enumerate(images):
 pil_image = image.to_pil()
 display(pil_image)

Feel free to experiment with the parameters to see what happens to the resulting image. In this case, I’ve put in a simple prompt meant to describe the Docker logo: “a whale with a container on its back.” I also added standard negative prompts to help generate the style of image I’m looking for. Figure 3 shows the output:

Figure 3: An SDXL-generated image of a whale with a container on its back.

4. Control your image output with ControlNet

One thing you may want to do with SDXL is control the composition of your AI-generated image. For example, you can specify a specific human pose or control the composition and perspective of a given photograph, etc.

For our experiment using Moby (the Docker mascot), we’d like to get an AI-generated image that can be easily superimposed onto the original logo — same shape of whale and container, orientation of the subject, size, and so forth.

This is where ControlNet can come in handy: they let you constrain the generation of images by feeding a control image as input. In our example we’ll feed the image of the Moby logo as our control input.

By tweaking the following parameters used by the ImageGenerator API, we are constraining the SDXL image generation with a control image of Moby. That control image will be converted into a depth map using a depth estimation model, then fed into the ControlNet, which will constrain SDXL image generation.

# Set the engine to controlnet SDXL
 engine="controlnet-sdxl",
 # Select depth controlnet which uses a depth map to apply
 # constraints to SDXL
 controlnet="depth_sdxl",
 # Set the conditioning scale anywhere between 0 and 1, try different
 # values to see what they do!
 controlnet_conditioning_scale=0.3,
 # Pass in the base64 encoded string of the moby logo image
 controlnet_image=image_to_base64(moby_image),

Now the result looks like it matches the Moby outline a lot more closely (Figure 4). This is the power of ControlNet. You can adjust the strength by varying the controlnet_conditioning_scale parameter. This way, you can make the output image more or less faithfully match the control image of Moby.

Figure 4: Left: The Moby logo is used as a control image to a ControlNet. Right: the SDXL-generated image resembles the control image more closely than in the previous example.

5. Control your image output with SDXL style presets

Let’s add a layer of customization with SDXL styles. We’ll use the 3D Model style preset (Figure 5). Behind the scenes, these style presets are adding additional keywords to the positive and negative prompts that the SDXL model ingests.

Figure 5: You can try various styles on the OctoAI Image Generation solution UI — there are more than 100 to choose from, each delivering a unique feel and aesthetic.

Figure 6 shows how setting this one parameter in the ImageGenerator API transforms our AI-generated image of Moby. Go ahead and try out more styles; we’ve generated a gallery for you to get inspiration from.

Figure 6: SDXL-generated image of Moby with the “3D Model” style preset applied.

6. Manipulate images with Mistral-7B LLM

So far we’ve relied on SDXL, which does text-to-image generation. We’ve added ControlNet in the mix to apply a control image as a compositional constraint.

Next, we’re going to layer an LLM into the mix to transform our original image prompt into a creative and rich textual description based on a “transformation prompt.”

Basically, we’re going to use an LLM to make our prompt better automatically. This will allow us to perform image manipulation using text in our OctoShop model cocktail pipeline:

Take a logo of Moby: Set it into an ultra-realistic photo in space.
Take a child’s drawing: Bring it to life in a fantasy world.
Take a photo of a cocktail: Set it on a beach in Italy.
Take a photo of a living room: Transform it into a staged living room in a designer house.

To achieve this text-to-text transformation, we will use the LLM user prompt as follows. This sets the original textual description of Moby into a new setting: the vast expanse of space.

'''
Human: set the image description into space: “a whale with a container on its back”
AI: '''

We’ve configured the LLM system prompt so that LLM responses are concise and at most one sentence long. We could make them longer, but be aware that the prompt consumed by SDXL has a 77-token context limit.

You can read more on the text generation Python SDK and its Chat Completions API used to generate text:

Model: Lets you choose out of selection of foundational open source models like Mixtral, Mistral, Llama2, Code Llama (the selection will grow with more open source models being released).
Messages: Contains a list of messages (system and user) to use as context for the completion.
Max tokens: Enforces a hard limit on output tokens (this could cut a completion response in the middle of a sentence).
Temperature: Lets you control the creativity of your answer: with a higher temperature, less likely tokens can be selected.

The choice of model, input, and output tokens will influence pricing on OctoAI. In this example, we’re using the Mistral-7B LLM, which is a great open source LLM model that really packs a punch given its small parameter size.

Let’s look at the code used to invoke our Mistral-7B LLM:

# Let's go ahead and start with the original prompt that we used in our
# image generation examples.
image_desc = "a whale with a container on its back"


# Let's then prepare our LLM prompt to manipulate our image
llm_prompt = '''
Human: set the image description into space: {}
AI: '''.format(image_desc)


# Now let's use an LLM to transform this craft clay rendition
# of Moby into a fun scify universe
from octoai.client import Client


client = Client(OCTOAI_API_TOKEN)
completion = client.chat.completions.create(
 messages=[
   {
     "role": "system",
     "content": "You are a helpful assistant. Keep your responses short and limited to one sentence."
   },
   {
     "role": "user",
     "content": llm_prompt
   }
 ],
 model="mistral-7b-instruct-fp16",
 max_tokens=128,
 temperature=0.01
)


# Print the message we get back from the LLM
llm_image_desc = completion.choices[0].message.content
print(llm_image_desc)

Here’s the output:

Our LLM has created a short yet imaginative description of Moby traveling through space. Figure 7 shows the result when we feed this LLM-generated textual description into SDXL.

Figure 7: SDXL-generated image of Moby where we used an LLM to set the scene in space and enrich the text prompt.

This image is great. We can feel the immensity of space. With the power of LLMs and the flexibility of SDXL, we can take image creation and manipulation to new heights. And the great thing is, all we need to manipulate those images is text; the GenAI models do the rest of the work.

7. Automate the workflow with AI-based image labeling

So far in our image transformation pipeline, we’ve had to manually label the input image to our OctoShop model cocktail. Instead of just passing in the image of Moby, we had to provide a textual description of that image.

Thankfully, we can rely on a GenAI model to perform text labeling tasks: CLIP Interrogator. Think of this task as the reverse of what SDXL does: It takes in an image and produces text as the output.

To get started, we’ll need a CLIP Interrogator model running behind an endpoint somewhere. There are two ways to get a CLIP Interrogator model endpoint on OctoAI. If you’re just getting started, we recommend the simple approach, and if you feel inspired to customize your model endpoint, you can use the more advanced approach. For instance, you may be interested in trying out the more recent version of CLIP Interrogator.

You can now invoke the CLIP Interrogator model in a few lines of code. We’ll use the fast interrogator mode here to get a label generated as quickly as possible.

# Let's go ahead and invoke the CLIP interrogator model


# Note that under a cold start scenario, you may need to wait a minute or two
# to get the result of this inference... Be patient!
output = client.infer(
   endpoint_url=CLIP_ENDPOINT_URL+'/predict',
   inputs={
       "image": image_to_base64(moby_image),
       "mode": "fast"
   }
)


# All labels
clip_labels = output["completion"]["labels"]
print(clip_labels)


# Let's get just the top label
top_label = clip_labels.split(',')[0]
print(top_label)

The top label described our Moby logo as:

That’s pretty on point. Now that we’ve tested all ingredients individually, let’s assemble our model cocktail and test it on interesting use cases.

8. Assembling the model cocktail

Now that we have tested our three models (CLIP interrogator, Mistral-7B, SDXL), we can package them into one convenient function, which takes the following inputs:

An input image that will be used to control the output image and also be automatically labeled by our CLIP interrogator model.
A transformation string that describes the transformation we want to apply to the input image (e.g., “set the image description in space”).
A style string which lets us better control the artistic output of the image independently of the transformation we apply to it (e.g., painterly style vs. cinematic).

The function below is a rehash of all of the code we’ve introduced above, packed into one function.

def genai_transform(image: Image, transformation: str, style: str) -> Image:
 # Step 1: CLIP captioning
 output = client.infer(
   endpoint_url=CLIP_ENDPOINT_URL+'/predict',
   inputs={
     "image": image_to_base64(image),
     "mode": "fast"
   }
 )
 clip_labels = output["completion"]["labels"]
 top_label = clip_labels.split(',')[0]


 # Step 2: LLM transformation
 llm_prompt = '''
 Human: {}: {}
 AI: '''.format(transformation, top_label)
 completion = client.chat.completions.create(
   messages=[
     {
       "role": "system",
       "content": "You are a helpful assistant. Keep your responses short and limited to one sentence."
     },
     {
       "role": "user",
       "content": llm_prompt
     }
   ],
   model="mistral-7b-instruct-fp16",
   max_tokens=128,
   presence_penalty=0,
   temperature=0.1,
   top_p=0.9,
 )
 llm_image_desc = completion.choices[0].message.content


 # Step 3: SDXL+controlnet transformation
 image_gen_response = image_gen.generate(
   engine="controlnet-sdxl",
   controlnet="depth_sdxl",
   controlnet_conditioning_scale=0.4,
   controlnet_image=image_to_base64(image),
   prompt=llm_image_desc,
   negative_prompt="blurry photo, distortion, low-res, poor quality",
   width=1024,
   height=1024,
   num_images=1,
   sampler="DPM_PLUS_PLUS_2M_KARRAS",
   steps=20,
   cfg_scale=7.5,
   use_refiner=True,
   seed=1,
   style_preset=style
 )
 images = image_gen_response.images


 # Display generated image from OctoAI
 pil_image = images[0].to_pil()
 return top_label, llm_image_desc, pil_image

Now you can try this out on several images, prompts, and styles.

Package your model cocktail into a web app

Now that you’ve mixed your unique GenAI cocktail, it’s time to pour it into a glass and garnish it, figuratively speaking. We built a simple Streamlit frontend that lets you deploy your unique OctoShop GenAI model cocktail and share the results with your friends and colleagues (Figure 8). You can check it on GitHub.

Follow the README instructions to deploy your app locally or get it hosted on Streamlit’s web hosting services.

Figure 8: The Streamlit app transforms images into realistic renderings in space — all thanks to the magic of GenAI.

We look forward to seeing what great image-processing apps you come up with. Go ahead and share your creations on OctoAI’s Discord server in the #built_with_octo channel!

If you want to learn how you can put OctoShop behind a Discord Bot or build your own model containers with Docker, we also have instructions on how to do that from an AI/ML workshop organized by OctoAI at DockerCon 2023.

About OctoAI

OctoAI provides infrastructure to run GenAI at scale, efficiently, and robustly. The model endpoints that OctoAI delivers to serve models like Mixtral, Stable Diffusion XL, etc. all rely on Docker to containerize models and make them easier to serve at scale.

If you go to octoai.cloud, you’ll find three complementary solutions that developers can build on to bring their GenAI-powered apps and pipelines into production.

Image Generation solution exposes endpoints and APIs to perform text to image, image to image tasks built around open source foundational models such as Stable Diffusion XL or SSD.
Text Generation solution exposes endpoints and APIs to perform text generation tasks built around open source foundational models, such as Mixtral/Mistral, Llama2, or CodeLlama.
Compute solution lets you deploy and manage any dockerized model container on capable OctoAI cloud endpoints to power your demanding GenAI needs. This compute service complements the image generation and text generation solutions by exposing infinite programmability and customizability for AI tasks that are not currently readily available on either the image generation or text generation solutions.

Disclaimer

OctoShop is built on the foundation of CLIP Interrogator and SDXL, and Mistral-7B and is therefore likely to carry forward the potential dangers inherent in these base models. It’s capable of generating unintended, unsuitable, offensive, and/or incorrect outputs. We therefore strongly recommend exercising caution and conducting comprehensive assessments before deploying this model into any practical applications.

This GenAI model workflow doesn’t work on people as it won’t preserve their likeness; the pipeline works best on scenes, objects, or animals. Solutions are available to address this problem, such as face mapping techniques (also known as face swapping), which we can containerize with Docker and deploy on OctoAI Compute solution, but that’s something to cover in another blog post.

Conclusion

This article covered the fundamentals of building a GenAI model cocktail by relying on a combination of text generation, image generation, and compute solutions powered by the portability and scalability enabled by Docker containerization.

If you’re interested in learning more about building these kinds of GenAI model cocktails, check out the OctoAI demo page or join OctoAI on Discord to see what people have been building.

Acknowledgements

The authors acknowledge Justin Gage for his thorough review, as well as Luis Vega, Sameer Farooqui, and Pedro Toruella for their contributions to the DockerCon AI/ML Workshop 2023, which inspired this article. The authors also thank Cia Bodin and her daughter Ada for the drawing used in this blog post.

Learn more

Watch the DockerCon 2023 Docker for ML, AI, and Data Science workshop.
Get the latest release of Docker Desktop.
Vote on what’s next! Check out our public roadmap.
Have questions? The Docker community is here to help.
New to Docker? Get started.

LLM Everywhere: Docker for Local and Hugging Face Hosting

Harsh Manvar — Thu, 09 Nov 2023 16:13:20 +0000

This post is written in collaboration with Docker Captain Harsh Manvar.

Hugging Face has become a powerhouse in the field of machine learning (ML). Their large collection of pretrained models and user-friendly interfaces have entirely changed how we approach AI/ML deployment and spaces. If you’re interested in looking deeper into the integration of Docker and Hugging Face models, a comprehensive guide can be found in the article “Build Machine Learning Apps with Hugging Face’s Docker Spaces.”

The Large Language Model (LLM) — a marvel of language generation — is an astounding invention. In this article, we’ll look at how to use the Hugging Face hosted Llama model in a Docker context, opening up new opportunities for natural language processing (NLP) enthusiasts and researchers.

Introduction to Hugging Face and LLMs

Hugging Face (HF) provides a comprehensive platform for training, fine-tuning, and deploying ML models. And, LLMs provide a state-of-the-art model capable of performing tasks like text generation, completion, and classification.

Leveraging Docker for ML

The robust Docker containerization technology makes it easier to package, distribute, and operate programs. It guarantees that ML models operate consistently across various contexts by enclosing them within Docker containers. Reproducibility is ensured, and the age-old “it works on my machine” issue is resolved.

Type of formats

For the majority of models on Hugging Face, two options are available.

GPTQ (usually 4-bit or 8-bit, GPU only)
GGML (usually 4-, 5-, 8-bit, CPU/GPU hybrid)

Examples of quantization techniques used in AI model quantization include the GGML and GPTQ models. This can mean quantization either during or after training. By reducing model weights to a lower precision, the GGML and GPTQ models — two well-known quantized models — minimize model size and computational needs.

HF models load on the GPU, which performs inference significantly more quickly than the CPU. Generally, the model is huge, and you also need a lot of VRAM. In this article, we will utilize the GGML model, which operates well on CPU and is probably faster if you don’t have a good GPU.

We will also be using transformers and ctransformers in this demonstration, so let’s first understand those:

transformers: Modern pretrained models can be downloaded and trained with ease thanks to transformers’ APIs and tools. By using pretrained models, you can cut down on the time and resources needed to train a model from scratch, as well as your computational expenses and carbon footprint.
ctransformers: Python bindings for the transformer models developed in C/C++ with the GGML library.

Request Llama model access

We will utilize the Meta Llama model, signup, and request for access.

Create Hugging Face token

To create an Access token that will be used in the future, go to your Hugging Face profile settings and select Access Token from the left-hand sidebar (Figure 1). Save the value of the created Access Token.

Figure 1. Generating Access Token.

Setting up Docker environment

Before exploring the realm of the LLM, we must first configure our Docker environment. Install Docker first, following the instructions on the official Docker website based on your operating system. After installation, execute the following command to confirm your setup:

docker --version

Quick demo

The following command runs a container with the Hugging Face harsh-manvar-llama-2-7b-chat-test:latest image and exposes port 7860 from the container to the host machine. It will also set the environment variable HUGGING_FACE_HUB_TOKEN to the value you provided.

docker run -it -p 7860:7860 --platform=linux/amd64 \
	-e HUGGING_FACE_HUB_TOKEN="YOUR_VALUE_HERE"
\
	registry.hf.space/harsh-manvar-llama-2-7b-chat-test:latest python app.py

The -it flag tells Docker to run the container in interactive mode and to attach a terminal to it. This will allow you to interact with the container and its processes.
The -p flag tells Docker to expose port 7860 from the container to the host machine. This means that you will be able to access the container’s web server from the host machine on port 7860.
The --platform=linux/amd64 flag tells Docker to run the container on a Linux machine with an AMD64 architecture.
The -e HUGGING_FACE_HUB_TOKEN="YOUR_VALUE_HERE" flag tells Docker to set the environment variable HUGGING_FACE_HUB_TOKEN to the value you provided. This is required for accessing Hugging Face models from the container.

The app.py script is the Python script that you want to run in the container. This will start the container and open a terminal to it. You can then interact with the container and its processes in the terminal. To exit the container, press Ctrl+C.

Accessing the landing page

To access the container’s web server, open a web browser and navigate to http://localhost:7860. You should see the landing page for your Hugging Face model (Figure 2).

Open your browser and go to http://localhost:7860:

Figure 2. Accessing local Docker LLM.

Getting started

Cloning the project

To get started, you can clone or download the Hugging Face existing space/repository.

git clone https://huggingface.co/spaces/harsh-manvar/llama-2-7b-chat-test

File: requirements.txt

A requirements.txt file is a text file that lists the Python packages and modules that a project needs to run. It is used to manage the project’s dependencies and to ensure that all developers working on the project are using the same versions of the required packages.

The following Python packages are required to run the Hugging Face llama-2-13b-chat model. Note that this model is large, and it may take some time to download and install. You may also need to increase the memory allocated to your Python process to run the model.

gradio==3.37.0
protobuf==3.20.3
scipy==1.11.1
torch==2.0.1
sentencepiece==0.1.99
transformers==4.31.0
ctransformers==0.2.27

File: Dockerfile

FROM python:3.9
RUN useradd -m -u 1000 user
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
RUN pip install --upgrade pip
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
USER user
COPY --link --chown=1000 ./ /code

The following section provides a breakdown of the Dockerfile. The first line tells Docker to use the official Python 3.9 image as the base image for our image:

FROM python:3.9

The following line creates a new user named user with the user ID 1000. The -m flag tells Docker to create a home directory for the user.

RUN useradd -m -u 1000 user

Next, this line sets the working directory for the container to /code.

WORKDIR /code

It’s time to copy the requirements file from the current directory to /code in the container. Also, this line upgrades the pip package manager in the container.

RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt

This line sets the default user for the container to user.

USER user

The following line copies the contents of the current directory to /code in the container. The --link flag tells Docker to create hard links instead of copying the files, which can improve performance and reduce the size of the image. The --chown=1000 flag tells Docker to change the ownership of the copied files to the user user.

COPY --link --chown=1000 ./ /code

Once you have built the Docker image, you can run it using the docker run command. This will start a new container running the Python 3.9 image with the non-root user user. You can then interact with the container using the terminal.

File: app.py

The Python code shows how to use Gradio to create a demo for a text-generation model trained using transformers. The code allows users to input a text prompt and generate a continuation of the text.

Gradio is a Python library that allows you to create and share interactive machine learning demos with ease. It provides a simple and intuitive interface for creating and deploying demos, and it supports a wide range of machine learning frameworks and libraries, including transformers.

This Python script is a Gradio demo for a text chatbot. It uses a pretrained text generation model to generate responses to user input. We’ll break down the file and look at each of the sections.

The following line imports the Iterator type from the typing module. This type is used to represent a sequence of values that can be iterated over. The next line imports the gradio library as well.

from typing import Iterator
import gradio as gr

The following line imports the logging module from the transformers library, which is a popular machine learning library for natural language processing.

from transformers.utils import logging
from model import get_input_token_length, run

Next, this line imports the get_input_token_length() and run() functions from the model module. These functions are used to calculate the input token length of a text and generate text using a pretrained text generation model, respectively. The next two lines configure the logging module to print information-level messages and to use the transformers logger.

from model import get_input_token_length, run

logging.set_verbosity_info()
logger = logging.get_logger("transformers")

The following lines define some constants that are used throughout the code. Also, the lines define the text that is displayed in the Gradio demo.

DEFAULT_SYSTEM_PROMPT = """"""
MAX_MAX_NEW_TOKENS = 2048
DEFAULT_MAX_NEW_TOKENS = 1024
MAX_INPUT_TOKEN_LENGTH = 4000

DESCRIPTION = """"""

LICENSE = """"""

This line logs an information-level message indicating that the code is starting. This function clears the textbox and saves the input message to the saved_input state variable.

logger.info("Starting")
def clear_and_save_textbox(message: str) -> tuple[str, str]:
    return '', message

The following function displays the input message in the chatbot and adds the message to the chat history.

def display_input(message: str,
                   history: list[tuple[str, str]]) -> list[tuple[str, str]]:
  history.append((message, ''))
  logger.info("display_input=%s",message)             
  return history

This function deletes the previous response from the chat history and returns the updated chat history and the previous response.

def delete_prev_fn(
    history: list[tuple[str, str]]) -> tuple[list[tuple[str, str]], str]:
  try:
    message, _ = history.pop()
  except IndexError:
    message = ''
  return history, message or ''

The following function generates text using the pre-trained text generation model and the given parameters. It returns an iterator that yields a list of tuples, where each tuple contains the input message and the generated response.

def generate(
    message: str,
    history_with_input: list[tuple[str, str]],
    system_prompt: str,
    max_new_tokens: int,
    temperature: float,
    top_p: float,
    top_k: int,
) -> Iterator[list[tuple[str, str]]]:
  #logger.info("message=%s",message)
  if max_new_tokens > MAX_MAX_NEW_TOKENS:
    raise ValueError

  history = history_with_input[:-1]
  generator = run(message, history, system_prompt, max_new_tokens, temperature, top_p, top_k)
  try:
    first_response = next(generator)
    yield history + [(message, first_response)]
  except StopIteration:
    yield history + [(message, '')]
  for response in generator:
    yield history + [(message, response)]

The following function generates a response to the given message and returns the empty string and the generated response.

def process_example(message: str) -> tuple[str, list[tuple[str, str]]]:
  generator = generate(message, [], DEFAULT_SYSTEM_PROMPT, 1024, 1, 0.95, 50)
  for x in generator:
    pass
  return '', x

Here’s the complete Python code:

from typing import Iterator
import gradio as gr

from transformers.utils import logging
from model import get_input_token_length, run

logging.set_verbosity_info()
logger = logging.get_logger("transformers")

DEFAULT_SYSTEM_PROMPT = """"""
MAX_MAX_NEW_TOKENS = 2048
DEFAULT_MAX_NEW_TOKENS = 1024
MAX_INPUT_TOKEN_LENGTH = 4000

DESCRIPTION = """"""

LICENSE = """"""

logger.info("Starting")
def clear_and_save_textbox(message: str) -> tuple[str, str]:
    return '', message


def display_input(message: str,
                  history: list[tuple[str, str]]) -> list[tuple[str, str]]:
    history.append((message, ''))
    logger.info("display_input=%s",message)             
    return history


def delete_prev_fn(
        history: list[tuple[str, str]]) -> tuple[list[tuple[str, str]], str]:
    try:
        message, _ = history.pop()
    except IndexError:
        message = ''
    return history, message or ''


def generate(
    message: str,
    history_with_input: list[tuple[str, str]],
    system_prompt: str,
    max_new_tokens: int,
    temperature: float,
    top_p: float,
    top_k: int,
) -> Iterator[list[tuple[str, str]]]:
    #logger.info("message=%s",message)
    if max_new_tokens > MAX_MAX_NEW_TOKENS:
        raise ValueError

    history = history_with_input[:-1]
    generator = run(message, history, system_prompt, max_new_tokens, temperature, top_p, top_k)
    try:
        first_response = next(generator)
        yield history + [(message, first_response)]
    except StopIteration:
        yield history + [(message, '')]
    for response in generator:
        yield history + [(message, response)]


def process_example(message: str) -> tuple[str, list[tuple[str, str]]]:
    generator = generate(message, [], DEFAULT_SYSTEM_PROMPT, 1024, 1, 0.95, 50)
    for x in generator:
        pass
    return '', x


def check_input_token_length(message: str, chat_history: list[tuple[str, str]], system_prompt: str) -> None:
    #logger.info("check_input_token_length=%s",message)
    input_token_length = get_input_token_length(message, chat_history, system_prompt)
    #logger.info("input_token_length",input_token_length)
    #logger.info("MAX_INPUT_TOKEN_LENGTH",MAX_INPUT_TOKEN_LENGTH)
    if input_token_length > MAX_INPUT_TOKEN_LENGTH:
        logger.info("Inside IF condition")
        raise gr.Error(f'The accumulated input is too long ({input_token_length} > {MAX_INPUT_TOKEN_LENGTH}). Clear your chat history and try again.')
    #logger.info("End of check_input_token_length function")


with gr.Blocks(css='style.css') as demo:
    gr.Markdown(DESCRIPTION)
    gr.DuplicateButton(value='Duplicate Space for private use',
                       elem_id='duplicate-button')

    with gr.Group():
        chatbot = gr.Chatbot(label='Chatbot')
        with gr.Row():
            textbox = gr.Textbox(
                container=False,
                show_label=False,
                placeholder='Type a message...',
                scale=10,
            )
            submit_button = gr.Button('Submit',
                                      variant='primary',
                                      scale=1,
                                      min_width=0)
    with gr.Row():
        retry_button = gr.Button('Retry', variant='secondary')
        undo_button = gr.Button('Undo', variant='secondary')
        clear_button = gr.Button('Clear', variant='secondary')

    saved_input = gr.State()

    with gr.Accordion(label='Advanced options', open=False):
        system_prompt = gr.Textbox(label='System prompt',
                                   value=DEFAULT_SYSTEM_PROMPT,
                                   lines=6)
        max_new_tokens = gr.Slider(
            label='Max new tokens',
            minimum=1,
            maximum=MAX_MAX_NEW_TOKENS,
            step=1,
            value=DEFAULT_MAX_NEW_TOKENS,
        )
        temperature = gr.Slider(
            label='Temperature',
            minimum=0.1,
            maximum=4.0,
            step=0.1,
            value=1.0,
        )
        top_p = gr.Slider(
            label='Top-p (nucleus sampling)',
            minimum=0.05,
            maximum=1.0,
            step=0.05,
            value=0.95,
        )
        top_k = gr.Slider(
            label='Top-k',
            minimum=1,
            maximum=1000,
            step=1,
            value=50,
        )

    gr.Markdown(LICENSE)

    textbox.submit(
        fn=clear_and_save_textbox,
        inputs=textbox,
        outputs=[textbox, saved_input],
        api_name=False,
        queue=False,
    ).then(
        fn=display_input,
        inputs=[saved_input, chatbot],
        outputs=chatbot,
        api_name=False,
        queue=False,
    ).then(
        fn=check_input_token_length,
        inputs=[saved_input, chatbot, system_prompt],
        api_name=False,
        queue=False,
    ).success(
        fn=generate,
        inputs=[
            saved_input,
            chatbot,
            system_prompt,
            max_new_tokens,
            temperature,
            top_p,
            top_k,
        ],
        outputs=chatbot,
        api_name=False,
    )

    button_event_preprocess = submit_button.click(
        fn=clear_and_save_textbox,
        inputs=textbox,
        outputs=[textbox, saved_input],
        api_name=False,
        queue=False,
    ).then(
        fn=display_input,
        inputs=[saved_input, chatbot],
        outputs=chatbot,
        api_name=False,
        queue=False,
    ).then(
        fn=check_input_token_length,
        inputs=[saved_input, chatbot, system_prompt],
        api_name=False,
        queue=False,
    ).success(
        fn=generate,
        inputs=[
            saved_input,
            chatbot,
            system_prompt,
            max_new_tokens,
            temperature,
            top_p,
            top_k,
        ],
        outputs=chatbot,
        api_name=False,
    )

    retry_button.click(
        fn=delete_prev_fn,
        inputs=chatbot,
        outputs=[chatbot, saved_input],
        api_name=False,
        queue=False,
    ).then(
        fn=display_input,
        inputs=[saved_input, chatbot],
        outputs=chatbot,
        api_name=False,
        queue=False,
    ).then(
        fn=generate,
        inputs=[
            saved_input,
            chatbot,
            system_prompt,
            max_new_tokens,
            temperature,
            top_p,
            top_k,
        ],
        outputs=chatbot,
        api_name=False,
    )

    undo_button.click(
        fn=delete_prev_fn,
        inputs=chatbot,
        outputs=[chatbot, saved_input],
        api_name=False,
        queue=False,
    ).then(
        fn=lambda x: x,
        inputs=[saved_input],
        outputs=textbox,
        api_name=False,
        queue=False,
    )

    clear_button.click(
        fn=lambda: ([], ''),
        outputs=[chatbot, saved_input],
        queue=False,
        api_name=False,
    )

demo.queue(max_size=20).launch(share=False, server_name="0.0.0.0")

The check_input_token_length and generate functions comprise the main part of the code. The generate function is responsible for generating a response given a message, a history of previous messages, and various generation parameters, including:

max_new_tokens: This is an integer that indicates the most tokens that the response-generating model is permitted to produce.
temperature: This float value regulates how random the output that is produced is. The result is more random at higher values (like 1.0) and more predictable at lower levels (like 0.2).
top_p: The nucleus sampling is determined by this float value, which ranges from 0 to 1. It establishes a cutoff point for the tokens’ cumulative probability.
top_k: The number of next tokens to be considered is represented by this integer. A greater number results in a more concentrated output.

The UI component and running the API server are handled by app.py. Basically, app.py is where you initialize the application and other configuration.

File: Model.py

The Python script is a chat bot that uses an LLM to generate responses to user input. The script uses the following steps to generate a response:

It creates a prompt for the LLM by combining the user input, the chat history, and the system prompt.
It calculates the input token length of the prompt.
It generates a response using the LLM and the following parameters:
- max_new_tokens: Maximum number of new tokens to generate.
- temperature: Temperature to use when generating the response. A higher temperature will result in more creative and varied responses, but it may also result in less coherent responses
- top_p: This parameter controls the nucleus sampling algorithm used to generate the response. A higher top_p value will result in more focused and informative responses, while a lower value will result in more creative and varied responses.
- top_k: This parameter controls the number of highest probability tokens to consider when generating the response. A higher top_k value will result in more predictable and consistent responses, while a lower value will result in more creative and varied responses.

The main function of the TextIteratorStreamer class is to store print-ready text in a queue. This queue can then be used by a downstream application as an iterator to access the generated text in a non-blocking way.

from threading import Thread
from typing import Iterator

#import torch
from transformers.utils import logging
from ctransformers import AutoModelForCausalLM 
from transformers import TextIteratorStreamer, AutoTokenizer

logging.set_verbosity_info()
logger = logging.get_logger("transformers")

config = {"max_new_tokens": 256, "repetition_penalty": 1.1, 
          "temperature": 0.1, "stream": True}
model_id = "TheBloke/Llama-2-7B-Chat-GGML"
device = "cpu"


model = AutoModelForCausalLM.from_pretrained(model_id, model_type="llama", lib="avx2", hf=True)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

def get_prompt(message: str, chat_history: list[tuple[str, str]],
               system_prompt: str) -> str:
    #logger.info("get_prompt chat_history=%s",chat_history)
    #logger.info("get_prompt system_prompt=%s",system_prompt)
    texts = [f'[INST] <>\n{system_prompt}\n<>\n\n']
    #logger.info("texts=%s",texts)
    do_strip = False
    for user_input, response in chat_history:
        user_input = user_input.strip() if do_strip else user_input
        do_strip = True
        texts.append(f'{user_input} [/INST] {response.strip()} [INST] ')
    message = message.strip() if do_strip else message
    #logger.info("get_prompt message=%s",message)
    texts.append(f'{message} [/INST]')
    #logger.info("get_prompt final texts=%s",texts)
    return ''.join(texts) def get_input_token_length(message: str, chat_history: list[tuple[str, str]], system_prompt: str) -> int:
    #logger.info("get_input_token_length=%s",message)
    prompt = get_prompt(message, chat_history, system_prompt)
    #logger.info("prompt=%s",prompt)
    input_ids = tokenizer([prompt], return_tensors='np', add_special_tokens=False)['input_ids']
    #logger.info("input_ids=%s",input_ids)
    return input_ids.shape[-1]

def run(message: str,
        chat_history: list[tuple[str, str]],
        system_prompt: str,
        max_new_tokens: int = 1024,
        temperature: float = 0.8,
        top_p: float = 0.95,
        top_k: int = 50) -> Iterator[str]:
    prompt = get_prompt(message, chat_history, system_prompt)
    inputs = tokenizer([prompt], return_tensors='pt', add_special_tokens=False).to(device)

    streamer = TextIteratorStreamer(tokenizer,
                                    timeout=15.,
                                    skip_prompt=True,
                                    skip_special_tokens=True)
    generate_kwargs = dict(
        inputs,
        streamer=streamer,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        top_p=top_p,
        top_k=top_k,
        temperature=temperature,
        num_beams=1,
    )
    t = Thread(target=model.generate, kwargs=generate_kwargs)
    t.start()

    outputs = []
    for text in streamer:
        outputs.append(text)
        yield "".join(outputs)

To import the necessary modules and libraries for text generation with transformers, we can use the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM

This will import the necessary modules for tokenizing and generating text with transformers.

To define the model to import, we can use:

model_id = "TheBloke/Llama-2-7B-Chat-GGML"

This step defines the model ID as TheBloke/Llama-2-7B-Chat-GGML, a scaled-down version of the Meta 7B chat LLama model.

Once you have imported the necessary modules and libraries and defined the model to import, you can load the tokenizer and model using the following code:

tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id)

This will load the tokenizer and model from the Hugging Face Hub.The job of a tokenizer is to prepare the model’s inputs. Tokenizers for each model are available in the library. Define the model to import; again, we’re using TheBloke/Llama-2-7B-Chat-GGML.

You need to set the variables and values in config for max_new_tokens, temperature, repetition_penalty, and stream:

max_new_tokens: Most tokens possible, disregarding the prompt’s specified quantity of tokens.

temperature: The amount that was utilized to modify the probability for the subsequent tokens.

repetition_penalty: Repetition penalty parameter. 1.0 denotes no punishment.

stream: Whether to generate the response text in a streaming manner or in a single batch.

You can also create the space and commit files to it to host applications on Hugging Face and test directly.

Building the image

The following command builds a Docker image for the llama-2-13b-chat model on the linux/amd64 platform. The image will be tagged with the name local-llm:v1.

docker buildx build --platform=linux/amd64 -t local-llm:v1 .

Running the container

The following command will start a new container running the local-llm:v1 Docker image and expose port 7860 on the host machine. The -e HUGGING_FACE_HUB_TOKEN="YOUR_VALUE_HERE" environment variable sets the Hugging Face Hub token, which is required to download the llama-2-13b-chat model from the Hugging Face Hub.

docker run -it -p 7860:7860 --platform=linux/amd64 -e HUGGING_FACE_HUB_TOKEN="YOUR_VALUE_HERE" local-llm:v1 python app.py

Next, open the browser and go to http://localhost:7860 to see local LLM Docker container output (Figure 3).

Figure 3. Local LLM Docker container output.

You can also view containers via the Docker Desktop (Figure 4).

Figure 4. Monitoring containers with Docker Desktop.

Conclusion

Deploying the LLM GGML model locally with Docker is a convenient and effective way to use natural language processing. Dockerizing the model makes it easy to move it between different environments and ensures that it will run consistently. Testing the model in a browser provides a user-friendly interface and allows you to quickly evaluate its performance.

This setup gives you more control over your infrastructure and data and makes it easier to deploy advanced language models for a variety of applications. It is a significant step forward in the deployment of large language models.

Learn more

Read Build Machine Learning Apps with Hugging Face’s Docker Spaces.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Getting Started with JupyterLab as a Docker Extension

Marcelo Ochoa — Thu, 12 Oct 2023 14:20:42 +0000

This post was written in collaboration with Marcelo Ochoa, the author of the Jupyter Notebook Docker Extension.

JupyterLab is a web-based interactive development environment (IDE) that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. It is the latest evolution of the popular Jupyter Notebook and offers several advantages over its predecessor, including:

A more flexible and extensible user interface: JupyterLab allows users to configure and arrange their workspace to best suit their needs. It also supports a growing ecosystem of extensions that can be used to add new features and functionality.

Support for multiple programming languages: JupyterLab is not just for Python anymore! It can now be used to run code in various programming languages, including R, Julia, and JavaScript.

A more powerful editor: JupyterLab’s built-in editor includes features such as code completion, syntax highlighting, and debugging, which make it easier to write and edit code.

Support for collaboration: JupyterLab makes collaborating with others on projects easy. Documents can be shared and edited in real-time, and users can chat with each other while they work.

This article provides an overview of the JupyterLab architecture and shows how to get started using JupyterLab as a Docker extension.

Uses for JupyterLab

JupyterLab is used by a wide range of people, including data scientists, scientific computing researchers, computational journalists, and machine learning engineers. It is a powerful interactive computing and data science tool and is becoming increasingly popular as an IDE.

Here are specific examples of how JupyterLab can be used:

Data science: JupyterLab can explore data, build and train machine learning models, and create visualizations.

Scientific computing: JupyterLab can perform numerical simulations, solve differential equations, and analyze data.

Computational journalism: JupyterLab can scrape data from the web, clean and prepare data for analysis, and create interactive data visualizations.

Machine learning: JupyterLab can develop and train machine learning models, evaluate model performance, and deploy models to production.

JupyterLab can help solve problems in the following ways:

JupyterLab provides a unified environment for developing and running code, exploring data, and creating visualizations. This can save users time and effort; they do not have to switch between different tools for different tasks.

JupyterLab makes it easy to share and collaborate on projects. Documents can be shared and edited in real-time, and users can chat with each other while they work. This can be helpful for teams working on complex projects.

JupyterLab is extensible. This means users can add new features and functionality to the environment using extensions, making JupyterLab a flexible tool that can be used for a wide range of tasks.

Project Jupyter’s tools are available for installation via the Python Package Index, the leading repository of software created for the Python programming language, but you can also get the JupyterLab environment up and running using Docker Desktop on Linux, Mac, or Windows.

Figure 1: JupyterLab is a powerful web-based IDE for data science

Architecture of JupyterLab

JupyterLab follows a client-server architecture (Figure 2) where the client, implemented in TypeScript and React, operates within the user’s web browser. It leverages the Webpack module bundler to package its code into a single JavaScript file and communicates with the server via WebSockets. On the other hand, the server is a Python application that utilizes the Tornado web framework to serve the client and manage various functionalities, including kernels, file management, authentication, and authorization. Kernels, responsible for executing code entered in the JupyterLab client, can be written in any programming language, although Python is commonly used.

The client and server exchange data and commands through the WebSockets protocol. The client sends requests to the server, such as code execution or notebook loading, while the server responds to these requests and returns data to the client.

Kernels are distinct processes managed by the JupyterLab server, allowing them to execute code and send results — including text, images, and plots — to the client. Moreover, JupyterLab’s flexibility and extensibility are evident through its support for extensions, enabling users to introduce new features and functionalities, such as custom kernels, file viewers, and editor plugins, to enhance their JupyterLab experience.

Figure 2: JupyterLab architecture.

JupyterLab is highly extensible. Extensions can be used to add new features and functionality to the client and server. For example, extensions can be used to add new kernels, new file viewers, and new editor plugins.

Examples of JupyterLab extensions include:

The ipywidgets extension adds support for interactive widgets to JupyterLab notebooks.

The nbextensions package provides a collection of extensions for the JupyterLab notebook.

The jupyterlab-server package provides extensions for the JupyterLab server.

JupyterLab’s extensible architecture makes it a powerful tool that can be used to create custom development environments tailored to users’ specific needs.

Why run JupyterLab as a Docker extension?

Running JupyterLab as a Docker extension offers a streamlined experience to users already familiar with Docker Desktop, simplifying the deployment and management of the JupyterLab notebook.

Docker provides an ideal environment to bundle, ship, and run JupyterLab in a lightweight, isolated setup. This encapsulation promotes consistent performance across different systems and simplifies the setup process.

Moreover, Docker Desktop is the only prerequisite to running JupyterLabs as an extension. Once you have Docker installed, you can easily set up and start using JupyterLab, eliminating the need for additional software installations or complex configuration steps.

Getting started

Getting started with the Docker Desktop Extension is a straightforward process that allows developers to leverage the benefits of unified development. The extension can easily be integrated into existing workflows, offering a familiar interface within Docker. This seamless integration streamlines the setup process, allowing developers to dive into their projects without extensive configuration.

The following key components are essential to completing this walkthrough:

Docker Desktop

Working with JupyterLabs as a Docker extension begins with opening the Docker Desktop. Here are the steps to follow (Figure 3):

Choose Extensions in the left sidebar.

Switch to the Browse tab.

In the Categories drop-down, select Utility Tools.

Find Jupyter Notebook and then select Install.

Figure 3: Installing JupyterLab with the Docker Desktop.

A JupyterLab welcome page will be shown (Figure 4).

Figure 4: JupyterLab welcome page.

Adding extra kernels

If you need to work with other languages rather than Python3 (default), you can complete a post-installation step. For example, to add the iJava kernel, launch a terminal and execute the following:

~ % docker exec -ti --user root jupyter_embedded_dd_vm /bin/sh -c "curl -s https://raw.githubusercontent.com/marcelo-ochoa/jupyter-docker-extension/main/addJava.sh | bash"

Figure 5 shows the install process output of the iJava kernel package.

Figure 5: Capture of iJava kernel installation process.

Next, close your extension tab or Docker Desktop, then reopen, and the new kernel and language support will be enabled (Figure 6).

Figure 6: New kernel and language support enabled.

Getting started with JupyterLab

You can begin using JupyterLab notebooks in many ways; for example, you can choose the language at the welcome page and start testing your code. Or, you can upload a file to the extension using the up arrow icon found at the upper left (Figure 7).

Figure 7: Sample JupyterLab iPython notebook.

Import a new notebook from local storage (Figures 8 and 9).

Figure 8: Upload dialog from disk.

Figure 9: Uploaded notebook.

Loading JupyterLab notebook from URL

If you want to import a notebook directly from the internet, you can use the File > Open URL option (Figure 10). This page shows an example for the notebook with Java samples.

Figure 10: Load notebook from URL.

A notebook upload from URL result is shown in Figure 11.

Figure 11: Uploaded notebook from URL.

Download a notebook to your personal folder

Just like uploading a notebook, the download operation is straightforward. Select your file name and choose the Download option (Figure 12).

Figure 12: Download to local disk option menu.

A download destination option is also shown (Figure 13).

Figure 13: Select local directory for downloading destination.

A note about persistent storage

The JupyterLab extension has a persistent volume for the /home/jovyan directory, which is the default directory of the JupyterLab environment. The contents of this directory will survive extension shutdown, Docker Desktop restart, and JupyterLab Extension upgrade. However, if you uninstall the extension, all this content will be discarded. Back up important data first.

Change the core image

This Docker extension uses a Docker image — jupyter/scipy-notebook:lab-4.0.6 (ubuntu 22.04) — but you can choose one of the following available versions (Figure 14).

Figure 14: JupyterLab core image options.

To change the extension image, you can follow these steps:

Uninstall the extension.

Install again, but do not open until the next step is done.

Edit the associated docker-compose.yml file of the extension. For example, on macOS, the file can be found at: Library/Containers/com.docker.docker/Data/extensions/mochoa_jupyter-docker-extension/vm/docker-compose.yml

Change the image name from jupyter/scipy-notebook:ubuntu-22.04 to jupyter/r-notebook:ubuntu-22.04.

Open the extension.

On Linux, the docker-compose.yml file can be found at: .docker/desktop/extensions/mochoa_jupyter-docker-extension/vm/docker-compose.yml

Using JupyterLab with other extensions

To use the JupyterLab extension to interact with other extensions, such as the MemGraph database (Figure 15), typical examples only require a minimal change of the host connection option. This usually means a sample notebook referrer to MemGraph host running on localhost. Because JupyterLab is another extension hosted in a different Docker stack, you have to replace localhost with host.docker.internal, which refers to the external IP of another extension. Here is an example:

URI = "bolt://localhost:7687"

needs to be replaced by:

URI = "bolt://host.docker.internal:7687"

Figure 15: Running notebook connecting to MemGraph extension.

Conclusion

The JupyterLab Docker extension is a ready-to-run Docker stack containing Jupyter applications and interactive computing tools using a personal Jupyter server with the JupyterLab frontend.

Through the integration of Docker, setting up and using JupyterLab is remarkably straightforward, further expanding its appeal to experienced and novice users alike.

The following video provides a good introduction with a complete walk-through of JupyterLab notebooks.

Learn more

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Get Started with the Microcks Docker Extension for API Mocking and Testing

Ajeet Singh Raina — Thu, 28 Sep 2023 15:04:00 +0000

In the dynamic landscape of software development, collaborations often lead to innovative solutions that simplify complex challenges. The Docker and Microcks partnership is a prime example, demonstrating how the relationship between two industry leaders can reshape local application development.

This article delves into the collaborative efforts of Docker and Microcks, spotlighting the emergence of the Microcks Docker Desktop Extension and its transformative impact on the development ecosystem.

What is Microcks?

Microcks is an open source Kubernetes and cloud-native tool for API mocking and testing. It has been a Cloud Native Computing Foundation Sandbox project since summer 2023.

Microcks addresses two primary use cases:

Simulating (or mocking) an API or a microservice from a set of descriptive assets (specifications or contracts)

Validating (or testing) the conformance of your application regarding your API specification by conducting contract-test

The unique thing about Microcks is that it offers a uniform and consistent approach for all kinds of request/response APIs (REST, GraphQL, gRPC, SOAP) and event-driven APIs (currently supporting eight different protocols) as shown in Figure 1.

Figure 1: Microcks covers all kinds of APIs.

Microcks speeds up the API development life cycle by shortening the feedback loop from the design phase and easing the pain of provisioning environments with many dependencies. All these features establish Microcks as a great help to enforce backward compatibility of your API of microservices interfaces.

So, for developers, Microcks brings consistency, convenience, and speed to your API lifecycle.

Why run Microcks as a Docker Desktop Extension?

Although Microcks is a powerhouse, running it as a Docker Desktop Extension takes the developer experience, ease of use, and rapid iteration in the inner loop to new levels. With Docker’s containerization capabilities seamlessly integrated, developers no longer need to navigate complex setups or wrestle with compatibility issues. It’s a plug-and-play solution that transforms the development environment into a playground for innovation.

The simplicity of running Microcks as a Docker extension is a game-changer. Developers can effortlessly set up and deploy Microcks in their existing Docker environment, eliminating the need for extensive configurations. This ease of use empowers developers to focus on what they do best — building and testing APIs rather than grappling with deployment intricacies.

In agile development, rapid iterations in the inner loop are paramount. Microcks, as a Docker extension, accelerates this process. Developers can swiftly create, test, and iterate on APIs without leaving the Docker environment. This tight feedback loop ensures developers identify and address issues early, resulting in faster development cycles and higher-quality software.

The combination of two best-of-breed projects, Docker and Microcks, provides:

Streamlined developer experience

Easiness at its core

Rapid iterations in the inner loop

Extension architecture

The Microcks Docker Desktop Extension has an evolving architecture depending on your enabling features. The UI that executes in Docker Desktop manages your preferences in a ~/.microcks-docker-desktop-extension folder and starts/stops/cleans the needed containers.

At its core, the architecture (Figure 2) embeds two minimal elements: the Microcks main container and a MongoDB database. The different containers of the extension run in an isolated Docker network where only the HTTP port of the main container is bound to your local host.

Figure 2: Microcks extension default architecture.

Through the Settings panel offered by the extension (Figure 3), you can tune the port binding and enable more features, such as:

The support of asynchronous APIs mocking and testing via the usefulness of AsyncAPI with Kafka and WebSocket

The ability to run Postman collection tests in Microcks includes support for Postman testing.

Figure 3: Microcks extension Settings panel.

When applied, your settings are persistent in your ~/.microcks-docker-desktop-extension folder, and the extension augments the initial architecture with the required services. Even though the extension starts with additional containers, they are carefully crafted and chosen to be lightweight and consume as few resources as possible. For example, we selected the Redpanda Kafka-compatible broker for its super-light experience.

The schema shown in Figure 4 illustrates such a “maximal architecture” for the extension:

Figure 4: Microcks extension maximal architecture.

The Docker Desktop Extension architecture encapsulates the convergence of Docker’s containerization capabilities and Microcks’ API testing prowess. This collaborative endeavor presents developers with a unified interface to toggle between these functionalities seamlessly. The architecture ensures a cohesive experience, enabling developers to harness the power of both Docker and Microcks without the need for constant tool switching.

Getting started

Getting started with the Docker Desktop Extension is a straightforward process that empowers developers to leverage the benefits of unified development. The extension can be easily integrated into existing workflows, offering a familiar interface within Docker. This seamless integration streamlines the setup process, allowing developers to dive into their projects without extensive configuration.

Here are the steps for installing Microcks as a Docker Desktop Extension:
1. Choose Add Extensions in the left sidebar (Figure 5).

Figure 5: Add extensions in the Docker Desktop.

2. Switch to the Browse tab.

3. In the Filters drop-down, select the Testing Tools category.

4. Find Microcks and then select Install (Figure 6).

Figure 6: Find and open Microcks.

Launching Microcks

The next step is to launch Microcks (Figure 7).

Figure 7: Launch Microcks.

The Settings panel allows you to configure some options, like whether you’d like to enable the asynchronous APIs features (default is disabled) and if you’d need to set an offset to ports used to access the services (Figures 8 and 9).

Figure 8: Microcks is up and running.

Figure 9: Access asynchronous APIs and services.

Sample app deployment

To illustrate the real-world implications of the Docker Desktop Extension, consider a sample application deployment. As developers embark on local application development, the Docker Desktop Extension enables them to create, test, and iterate on their containers while leveraging Microcks’ API mocking and testing capabilities.

This combined approach ensures that the application’s containerization and API aspects are thoroughly validated, resulting in a higher quality end product. Check out the three-minute “Getting Started with Microcks Docker Desktop Extension” video for more information.

Conclusion

The Docker and Microcks partnership, exemplified by the Docker Desktop Extension, signifies a milestone in collaborative software development. By harmonizing containerization and API testing, this collaboration addresses the challenges of fragmented workflows, accelerating development cycles and elevating the quality of applications.

By embracing the capabilities of Docker and Microcks, developers are poised to embark on a journey characterized by efficiency, reliability, and collaborative synergy.

Remember that Microcks is a Cloud Native Computing Sandbox project supported by an open community, which means you, too, can help make Microcks even greater. Come and say hi on our GitHub discussion or Zulip chat 🐙, send some love through GitHub stars ⭐️, or follow us on Twitter, Mastodon, LinkedIn, and our YouTube channel.

Learn more

Try the Microcks Docker Extension.

Learn about Docker Extensions.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Accelerating Machine Learning with TensorFlow.js: Using Pretrained Models and Docker

Harsh Manvar — Thu, 17 Aug 2023 13:26:22 +0000

In the rapidly evolving era of machine learning (ML) and artificial intelligence (AI), TensorFlow has emerged as a leading framework for developing and implementing sophisticated models. With the introduction of TensorFlow.js, TensorFlow’s capability is boosted for JavaScript developers.

TensorFlow.js is a JavaScript machine learning toolkit that facilitates the creation of ML models and their immediate use in browsers or Node.js apps. TensorFlow.js has expanded the capability of TensorFlow into the realm of actual web development. A remarkable aspect of TensorFlow.js is its ability to utilize pretrained models, which opens up a wide range of possibilities for developers.

In this article, we will explore the concept of pretrained models in TensorFlow.js and Docker and delve into the potential applications and benefits.

Understanding pretrained models

Pretrained models are a powerful tool for developers because they allow you to use ML models without having to train them yourself. This approach can save a lot of time and effort, and it can also be more accurate than training your own model from scratch.

A pretrained model is an ML model that has been professionally trained on a large volume of data. Because these models have been trained on complex patterns and representations, they are incredibly effective and precise in carrying out specific tasks. Developers may save a lot of time and computing resources by using pretrained models because they can avoid having to train a model from scratch.

Types of pretrained models available

There is a wide range of potential applications for pretrained models in TensorFlow.js.

For example, developers could use them to:

Build image classification models that can identify objects in images.

Build natural language processing (NLP) models that can understand and respond to text.

Build speech recognition models that can transcribe audio into text.

Build recommendation systems that can suggest products or services to users.

TensorFlow.js and pretrained models

Developers can easily include pretrained models in their web applications using TensorFlow.js. With TensorFlow.js, you can benefit from robust machine learning algorithms without needing to be an expert in model deployment or training. The library offers a wide variety of pretrained models, including those for audio analysis, picture identification, natural language processing, and more (Figure 1).

Figure 1: Available pretrained model types for TensorFlow.

How does it work?

The module allows for the direct loading of models in TensorFlow SavedModel or Keras Model formats. Once the model has been loaded, developers can use its features by invoking certain methods made available by the model API. Figure 2 shows the steps involved for training, distribution, and deployment.

Figure 2: TensorFlow.js model API for a pretrained image classification model.

Training

The training section shows the steps involved in training a machine learning model. The first step is to collect data. This data is then preprocessed, which means that it is cleaned and prepared for training. The data is then fed to a machine learning algorithm, which trains the model.

Preprocess data: This is the process of cleaning and preparing data for training. This includes tasks such as removing noise, correcting errors, and normalizing the data.

TF Hub: TF Hub is a repository of pretrained ML models. These models can be used to speed up the training process or to improve the accuracy of a model.

tf.keras: tf.keras is a high-level API for building and training machine learning models. It is built on top of TensorFlow, which is a low-level library for machine learning.

Estimator: An estimator is a type of model in TensorFlow. It is a simplified way to build and train ML models.

Distribution

Distribution is the process of making a machine learning model available to users. This can be done by packaging the model in a format that can be easily shared, or by deploying the model to a production environment.

The distribution section shows the steps involved in distributing a machine learning model. The first step is to package the model. This means that the model is converted into a format that can be easily shared. The model is then distributed to users, who can then use it to make predictions.

Deployment

The deployment section shows the steps involved in deploying a machine learning model. The first step is to choose a framework. A framework is a set of tools that makes it easier to build and deploy machine learning models. The model is then converted into a format that can be used by the framework. The model is then deployed to a production environment, where it can be used to make predictions.

Benefits of using pretrained models

There are several pretrained models available in TensorFlow.js that can be utilized immediately in any project and offer the following notable advantages:

Savings in time and resources: Building an ML model from scratch might take a lot of time and resources. Developers can skip this phase and use a model that has already learned from lengthy training by using pretrained models. The time and resources needed to implement machine learning solutions are significantly decreased as a result.

State-of-the-art performance: Pretrained models are typically trained on huge datasets and refined by specialists, producing models that give state-of-the-art performance across a range of applications. Developers can benefit from these models’ high accuracy and reliability by incorporating them into TensorFlow.js, even if they lack a deep understanding of machine learning.

Accessibility: TensorFlow.js makes pretrained models powerful for web developers, allowing them to quickly and easily integrate cutting-edge machine learning capabilities into their projects. This accessibility creates new opportunities for developing cutting-edge web-based solutions that make use of machine learning’s capabilities.

Transfer learning: Pretrained models can also serve as the foundation for your process. Using a smaller, domain-specific dataset, developers can further train a pretrained model. Transfer learning enables models to swiftly adapt to particular use cases, making this method very helpful when data is scarce.

Why is containerizing TensorFlow.js important?

Containerizing TensorFlow.js brings several important benefits to the development and deployment process of machine learning applications. Here are five key reasons why containerizing TensorFlow.js is important:

Docker provides a consistent and portable environment for running applications. By containerizing TensorFlow.js, you can package the application, its dependencies, and runtime environment into a self-contained unit. This approach allows you to deploy the containerized TensorFlow.js application across different environments, such as development machines, staging servers, and production clusters, with minimal configuration or compatibility issues.

Docker simplifies the management of dependencies for TensorFlow.js. By encapsulating all the required libraries, packages, and configurations within the container, you can avoid conflicts with other system dependencies and ensure that the application has access to the specific versions of libraries it needs. This containerization eliminates the need for manual installation and configuration of dependencies on different systems, making the deployment process more streamlined and reliable.

Docker ensures the reproducibility of your TensorFlow.js application’s environment. By defining the exact dependencies, libraries, and configurations within the container, you can guarantee that the application will run consistently across different deployments.

Docker enables seamless scalability of the TensorFlow.js application. With containers, you can easily replicate and distribute instances of the application across multiple nodes or servers, allowing you to handle high volumes of user requests.

Docker provides isolation between the application and the host system and between different containers running on the same host. This isolation ensures that the application’s dependencies and runtime environment do not interfere with the host system or other applications. It also allows for easy management of dependencies and versioning, preventing conflicts and ensuring a clean and isolated environment in which the application can operate.

Building a fully functional ML face-detection demo app

By combining the power of TensorFlow.js and Docker, developers can create a fully functional machine learning (ML) face-detection demo app. Once the app is deployed, the TensorFlow.js model can recognize faces in real-time by leveraging the camera. However, with a minor code change, it’s possible for developers to build an app that allows users to upload images or videos to be detected.

In this tutorial, you’ll learn how to build a fully functional face-detection demo app using TensorFlow.js and Docker. Figure 3 shows the file system architecture for this setup. Let’s get started.

Prerequisite

The following key components are essential to complete this walkthrough:

Docker Desktop

Figure 3: File system architecture for Docker Compose development setup.

Deploying a ML face-detection app is a simple process involving the following steps:

Clone the repository.

Set up the required configuration files.

Initialize TensorFlow.js.

Train and run the model.

Bring up the face-detection app.

We’ll explain each of these steps below.

Quick demo

If you’re in hurry, you can bring up the complete app by running the following command:

docker run -p 1234:1234 harshmanvar/face-detection-tensorjs:slim-v1

Open URL in browser: http://localhost:1234.

Figure 4: URL opened in a browser.

Getting started

Cloning the project

To get started, you can clone the repository by running the following command:

https://github.com/dockersamples/face-detection-tensorjs

We are utilizing the MediaPipe Face Detector demo for this demonstration. You first create a detector by choosing one of the models from SupportedModels, including MediaPipeFaceDetector.

For example:

const model = faceDetection.SupportedModels.MediaPipeFaceDetector; const detectorConfig = { runtime: 'mediapipe', // or 'tfjs' } const detector = await faceDetection.createDetector(model, detectorConfig);

Then you can use the detector to detect faces:

const faces = await detector.estimateFaces(image);

File: index.html:

The web application’s principal entry point is the index.html file. It includes the video element needed to display the real-time video stream from the user’s webcam and the basic HTML page structure. The relevant JavaScript scripts for the facial detection capabilities are also imported.

File: src/Index.js:

import '@tensorflow/tfjs-backend-webgl'; import '@tensorflow/tfjs-backend-webgpu'; import * as tfjsWasm from '@tensorflow/tfjs-backend-wasm'; tfjsWasm.setWasmPaths( `https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm@${ tfjsWasm.version_wasm}/dist/`); import * as faceDetection from '@tensorflow-models/face-detection'; import {Camera} from './camera'; import {setupDatGui} from './option_panel'; import {STATE, createDetector} from './shared/params'; import {setupStats} from './shared/stats_panel'; import {setBackendAndEnvFlags} from './shared/util'; let detector, camera, stats; let startInferenceTime, numInferences = 0; let inferenceTimeSum = 0, lastPanelUpdate = 0; let rafId; async function checkGuiUpdate() { if (STATE.isTargetFPSChanged || STATE.isSizeOptionChanged) { camera = await Camera.setupCamera(STATE.camera); STATE.isTargetFPSChanged = false; STATE.isSizeOptionChanged = false; } if (STATE.isModelChanged || STATE.isFlagChanged || STATE.isBackendChanged) { STATE.isModelChanged = true; window.cancelAnimationFrame(rafId); if (detector != null) { detector.dispose(); } if (STATE.isFlagChanged || STATE.isBackendChanged) { await setBackendAndEnvFlags(STATE.flags, STATE.backend); } try { detector = await createDetector(STATE.model); } catch (error) { detector = null; alert(error); } STATE.isFlagChanged = false; STATE.isBackendChanged = false; STATE.isModelChanged = false; } } function beginEstimateFaceStats() { startInferenceTime = (performance || Date).now(); } function endEstimateFaceStats() { const endInferenceTime = (performance || Date).now(); inferenceTimeSum += endInferenceTime - startInferenceTime; ++numInferences; const panelUpdateMilliseconds = 1000; if (endInferenceTime - lastPanelUpdate >= panelUpdateMilliseconds) { const averageInferenceTime = inferenceTimeSum / numInferences; inferenceTimeSum = 0; numInferences = 0; stats.customFpsPanel.update( 1000.0 / averageInferenceTime, 120); lastPanelUpdate = endInferenceTime; } } async function renderResult() { if (camera.video.readyState < 2) { await new Promise((resolve) => { camera.video.onloadeddata = () => { resolve(video); }; }); } let faces = null; if (detector != null) { beginEstimateFaceStats(); try { faces = await detector.estimateFaces(camera.video, {flipHorizontal: false}); } catch (error) { detector.dispose(); detector = null; alert(error); } endEstimateFaceStats(); } camera.drawCtx(); if (faces && faces.length > 0 && !STATE.isModelChanged) { camera.drawResults( faces, STATE.modelConfig.boundingBox, STATE.modelConfig.keypoints); } } async function renderPrediction() { await checkGuiUpdate(); if (!STATE.isModelChanged) { await renderResult(); } rafId = requestAnimationFrame(renderPrediction); }; async function app() { const urlParams = new URLSearchParams(window.location.search); await setupDatGui(urlParams); stats = setupStats(); camera = await Camera.setupCamera(STATE.camera); await setBackendAndEnvFlags(STATE.flags, STATE.backend); detector = await createDetector(); renderPrediction(); }; app();

JavaScript file that conducts the facial detection logic. TensorFlow.js is loaded, allowing for real-time face detection on the video stream using the pretrained face identification model. The file manages access to the camera, processing of the video frames, and creating bounding boxes around faces that have been recognized in the video feed.

File: src/camera.js:

import {VIDEO_SIZE} from './shared/params'; import {drawResults, isMobile} from './shared/util'; export class Camera { constructor() { this.video = document.getElementById('video'); this.canvas = document.getElementById('output'); this.ctx = this.canvas.getContext('2d'); } static async setupCamera(cameraParam) { if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) { throw new Error( 'Browser API navigator.mediaDevices.getUserMedia not available'); } const {targetFPS, sizeOption} = cameraParam; const $size = VIDEO_SIZE[sizeOption]; const videoConfig = { 'audio': false, 'video': { facingMode: 'user', width: isMobile() ? VIDEO_SIZE['360 X 270'].width : $size.width, height: isMobile() ? VIDEO_SIZE['360 X 270'].height : $size.height, frameRate: { ideal: targetFPS, }, }, }; const stream = await navigator.mediaDevices.getUserMedia(videoConfig); const camera = new Camera(); camera.video.srcObject = stream; await new Promise((resolve) => { camera.video.onloadedmetadata = () => { resolve(video); }; }); camera.video.play(); const videoWidth = camera.video.videoWidth; const videoHeight = camera.video.videoHeight; // Must set below two lines, otherwise video element doesn't show. camera.video.width = videoWidth; camera.video.height = videoHeight; camera.canvas.width = videoWidth; camera.canvas.height = videoHeight; const canvasContainer = document.querySelector('.canvas-wrapper'); canvasContainer.style = `width: ${videoWidth}px; height: ${videoHeight}px`; camera.ctx.translate(camera.video.videoWidth, 0); camera.ctx.scale(-1, 1); return camera; } drawCtx() { this.ctx.drawImage( this.video, 0, 0, this.video.videoWidth, this.video.videoHeight); } drawResults(faces, boundingBox, keypoints) { drawResults(this.ctx, faces, boundingBox, keypoints); } }

The configuration for the camera’s width, audio, and other setup-related items is managed in camera.js.

File: .babelrc:

The .babelrc file is used to configure Babel, a JavaScript compiler, specifying presets and plugins that define the transformations to be applied during code transpilation.

File: src/shared:

shared % tree . ├── option_panel.js ├── params.js ├── stats_panel.js └── util.js 1 directory, 4 files

The parameters and other shared files found in the src/shared folder are needed to run and access the camera, checks, and parameter values.

Defining services using a Compose file

Here’s how our services appear within a Docker Compose file:

services: tensorjs: #build: . image: harshmanvar/face-detection-tensorjs:v2 ports: - 1234:1234 volumes: - ./:/app - /app/node_modules command: watch

Your sample application has the following parts:

The tensorjs service is based on the harshmanvar/face-detection-tensorjs:v2 image.

This image contains the necessary dependencies and code to run a face detection system using TensorFlow.js.

It exposes port 1234 to communicate with the TensorFlow.js.

The volume ./:/app sets up a volume mount, linking the current directory (represented by ./) on the host machine to the /app directory within the container. This allows you to share files and code between your host machine and the container.

The watch command specifies the command to run within the container. In this case, it runs the watch command, which suggests that the face detection system will continuously monitor for changes or updates.

Building the image

It’s time to build the development image and install the dependencies to launch the face-detection model.

docker build -t tensor-development:v1

Running the container

docker run -p 1234:1234 -v $(pwd):/app -v /app/node_modules tensor-development:v1 watch

Bringing up the container services

You can launch the application by running the following command:

docker compose up -d

Then, use the docker compose ps command to confirm that your stack is running correctly. Your terminal will produce the following output:

docker compose ps NAME IMAGE COMMAND SERVICE STATUS PORTS tensorflow tensorjs:v2 "yarn watch" tensorjs Up 48 seconds 0.0.0.0:1234->1234/tcp

Viewing the containers via Docker Dashboard

You can also leverage the Docker Dashboard to view your container’s ID and easily access or manage your application (Figure 5) container.

Figure 5: Viewing containers in the Docker Dashboard.

Conclusion

Well done! You have acquired the knowledge to utilize a pre-trained machine learning model with JavaScript for a web application, all thanks to TensorFlow.js. In this article, we have demonstrated how Docker Compose lets you quickly create and deploy a fully functional ML face-detection demo app, using just one YAML file.

With this newfound expertise, you can now take this guide as a foundation to build even more sophisticated applications with just a few additional steps. The possibilities are endless, and your ML journey has just begun!

Learn more

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Memgraph Docker Extension: Empowering Real-Time Analytics with High Performance

Kruno Golubic — Fri, 04 Aug 2023 13:29:15 +0000

Memgraph is an open source, in-memory graph database designed with real-time analytics in mind. Providing a high-performance solution, Memgraph caters to developers and data scientists who require immediate, actionable insights from complex, interconnected data.

What sets Memgraph apart is its high-speed data processing ability, delivering performance that makes it significantly faster than other graph databases. This, however, is not achieved at the expense of data integrity or reliability. Memgraph is committed to providing accurate and dependable insights as fast as you need them.

Built entirely on a C++ codebase, Memgraph leverages in-memory storage to handle complex real-time use cases effectively. Support for ACID transactions guarantees data consistency, while the Cypher query language offers a robust toolset for data structuring, manipulation, and exploration.

Graph databases have a broad spectrum of applications. In domains as varied as cybersecurity, credit card fraud detection, energy management, and network optimization, Memgraph can efficiently analyze and traverse complex network structures and relationships within data. This analytical prowess facilitates real-time, in-depth revelations across a broad spectrum of industries and areas of study.

In this article, we’ll show how using Memgraph as a Docker Extension offers a powerful and efficient way to leverage real-time analytics from a graph database.

Architecture of Memgraph

The high-speed performance of Memgraph can be attributed to its unique architecture (Figure 1). Centered around graph models, the database represents data as nodes (entities) and edges (relationships), enabling efficient management of deeply interconnected data essential for a range of modern applications.

In terms of transactions, Memgraph upholds the highest standard. It uses the standardized Cypher query language over the Bolt protocol, facilitating efficient data structuring, manipulation, and exploration.

Figure 1: Components of Memgraph’s architecture.

The key components of Memgraph’s architecture are:

In-memory storage: Memgraph stores data in RAM for low-latency access, ensuring high-speed data retrieval and modifications. This is critical for applications that require real-time insights.

Transaction processing: Memgraph supports ACID (Atomicity, Consistency, Isolation, Durability) transactions, which means it guarantees that all database transactions are processed reliably and in a way that ensures data integrity, including when failures occur.

Query engine: Memgraph uses Cypher, a popular graph query language that’s declarative and expressive, allowing for complex data relationships to be easily retrieved and updated.

Storage engine: While Memgraph primarily operates in memory, it also provides a storage engine that takes care of data durability by persisting data on disk. This ensures that data won’t be lost even if the system crashes or restarts.

High availability and replication: Memgraph’s replication architecture can automatically replicate data across multiple machines, and it supports replication to provide high availability and fault tolerance.

Streaming and integration: Memgraph can connect with various data streams and integrate with different types of data sources, making it a versatile choice for applications that need to process and analyze real-time data.

To provide users with the utmost flexibility and control, Memgraph comprises several key components, each playing a distinct role in delivering seamless performance and user experience:

MemgraphDB — MemgraphDB is the heart of the Memgraph system. It deals with all concurrency problems, consistency issues, and scaling, both in terms of data and algorithm complexity. Using the Cypher query language, MemgraphDB allows users to query data and run algorithms. It also supports both push and pull operations, which means you can query data and run algorithms and get notified when something changes in the data.

Mgconsole — mgconsole is a command-line interface (CLI) used to interact with Memgraph from any terminal or operating system.

Memgraph Lab — Memgraph Lab is a visual user interface for running queries and visualizing graph data. It provides a more interactive experience, enabling users to apply different graph styles, import predefined datasets, and run example queries. It makes data analysis and visualization more intuitive and user-friendly.

MAGE (Memgraph Advanced Graph Extensions) — MAGE is an open source library of graph algorithms and custom Cypher procedures. It enables high-performance processing of demanding graph algorithms on streaming data. With MAGE, users can run a variety of algorithms, from PageRank or community detection to advanced machine learning techniques using graph embeddings. Moreover, MAGE does not limit users to a specific programming language.

Based on those four components, Memgraph offers four different Docker images:

memgraph-platform — Installs MemgraphDB, mgconsole, Memgraph Lab, and MAGE

memgraph-mage — Installs MemgraphDB, mgconsole, and MAGE

memgraph — Installs MemgraphDB and mgconsole

MAGE × NVIDIA cuGraph — Installs everything that you need to run MAGE in combination with NVIDIA cuGraph

With more than 10K downloads from Docker Hub, Memgraph Platform is the most popular Memgraph Docker image, so the team decided to base the Memgraph Docker extension on it. Instructions are available in the documentation if you want to use any of the other images. Let’s look at how to install Memgraph Docker Extension.

Why run Memgraph as a Docker Extension?

Running Memgraph as a Docker Extension offers a streamlined experience to users who are already familiar with Docker Desktop, simplifying the deployment and management of the graph database. Docker provides an ideal environment to bundle, ship, and run Memgraph in a lightweight, isolated setup. This encapsulation not only promotes consistent performance across different systems but also simplifies the setup process.

Moreover, Docker Desktop is the only prerequisite to run Memgraph as an extension. This means that once you have Docker installed, you can easily set up and start using Memgraph, eliminating the need for additional software installations or complex configuration steps.

Getting started

Working with Memgraph as a Docker Extension begins with opening the Docker Desktop (Figure 2). Here are the steps to follow:

Choose Extensions in the left sidebar.

Switch to the Browse tab.

In the Filters drop-down, select the Database category.

Find Memgraph and then select Install.

Figure 2: Installing Memgraph Docker Extension.

That’s it! Once the installation is finished, select Connect now (Figure 3).

Figure 3: Connecting to Memgraph database using Memgraph Lab.

What you see now is Memgraph Lab, a visual user interface designed for running queries and visualizing graph data. With a range of pre-prepared datasets, Memgraph Lab provides an ideal starting point for exploring Memgraph, gaining proficiency in Cypher querying, and effectively visualizing query results.

Importing the Pandora Papers datasets

For the purposes of this article, we will import the Pandora Papers dataset. To import the dataset, choose Datasets in the Memgraph Lab sidebar and then Load Dataset (Figure 4).

Figure 4: Importing the Pandora Papers dataset.

Once the dataset is loaded, select Explore Query Collection to access a selection of predefined queries (Figure 5).

Figure 5: Exploring the Pandora Papers dataset query collection.

Choose one of the queries and select Run Query (Figure 6).

Figure 6: Running the Cypher query.

And voilà! Welcome to the world of graphs. You now have the results of your query (Figure 7). Now that you’ve run your first query, feel free to explore other queries in the Query Collection, import a new dataset, or start adding your own data to the database.

Figure 7: Displaying the query result as a graph.

Conclusion

Memgraph, as a Docker Extension, offers an accessible, powerful, and efficient solution for anyone seeking to leverage real-time analytics from a graph database. Its unique architecture, coupled with a streamlined user interface and a high-speed query engine, allows developers and data scientists to extract immediate, actionable insights from complex, interconnected data.

Moreover, with the integration of Docker, the setup and use of Memgraph become remarkably straightforward, further expanding its appeal to both experienced and novice users alike. The best part is the variety of predefined datasets and queries provided by the Memgraph team, which serve as excellent starting points for users new to the platform.

Whether you’re diving into the world of graph databases for the first time or are an experienced data professional, Memgraph’s Docker Extension offers an intuitive and efficient solution. So, go ahead and install it on Docker Desktop and start exploring the intriguing world of graph databases today. If you have any questions about Memgraph, feel free to join Memgraph’s vibrant community on Discord.

Learn more

Install Memgraph’s Docker Extension.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Supercharging AI/ML Development with JupyterLab and Docker

Masahito Zembutsu — Mon, 24 Jul 2023 18:46:46 +0000

JupyterLab is an open source application built around the concept of a computational notebook document. It enables sharing and executing code, data processing, visualization, and offers a range of interactive features for creating graphs.

The latest version, JupyterLab 4.0, was released in early June. Compared to its predecessors, this version features a faster Web UI, improved editor performance, a new Extension Manager, and real-time collaboration.

If you have already installed the standalone 3.x version, evaluating the new features will require rewriting your current environment, which can be labor-intensive and risky. However, in environments where Docker operates, such as Docker Desktop, you can start an isolated JupyterLab 4.0 in a container without affecting your installed JupyterLab environment. Of course, you can run these without impacting the existing environment and access them on a different port.

In this article, we show how to quickly evaluate the new features of JupyterLab 4.0 using Jupyter Docker Stacks on Docker Desktop, without affecting the host PC side.

Why containerize JupyterLab?

Users have downloaded the base image of JupyterLab Notebook stack Docker Official Image more than 10 million times from Docker Hub. What’s driving this significant download rate? There’s an ever-increasing demand for Docker containers to streamline development workflows, while allowing JupyterLab developers to innovate with their choice of project-tailored tools, application stacks, and deployment environments. Our JupyterLab notebook stack official image also supports both AMD64 and Arm64/v8 platforms.

Containerizing the JupyterLab environment offers numerous benefits, including the following:

Containerization ensures that your JupyterLab environment remains consistent across different deployments. Whether you’re running JupyterLab on your local machine, in a development environment, or in a production cluster, using the same container image guarantees a consistent setup. This approach helps eliminate compatibility issues and ensures that your notebooks behave the same way across different environments.

Packaging JupyterLab in a container allows you to easily share your notebook environment with others, regardless of their operating system or setup. This eliminates the need for manually installing dependencies and configuring the environment, making it easier to collaborate and share reproducible research or workflows. And this is particularly helpful in AI/ML projects, where reproducibility is crucial.

Containers enable scalability, allowing you to scale your JupyterLab environment based on the workload requirements. You can easily spin up multiple containers running JupyterLab instances, distribute the workload, and take advantage of container orchestration platforms like Kubernetes for efficient resource management. This becomes increasingly important in AI/ML development, where resource-intensive tasks are common.

Getting started

To use JupyterLab on your computer, one option is to use the JupyterLab Desktop application. It’s based on Electron, so it operates with a GUI on Windows, macOS, and Linux. Indeed, using JupyterLab Desktop makes the installation process fairly simple. In a Windows environment, however, you’ll also need to set up the Python language separately, and, to extend the capabilities, you’ll need to use pip to set up packages.

Although such a desktop solution may be simpler than building from scratch, we think the combination of Docker Desktop and Docker Stacks is still the more straightforward option. With JupyterLab Desktop, you cannot mix multiple versions or easily delete them after evaluation. Above all, it does not provide a consistent user experience across Windows, macOS, and Linux.

On a Windows command prompt, execute the following command to launch a basic notebook:

docker container run -it --rm -p 10000:8888 jupyter/base-notebook

This command utilizes the jupyter/base-notebook Docker image, maps the host’s port 10000 to the container’s port 8888, and enables command input and a pseudo-terminal. Additionally, an option is added to delete the container once the process is completed.

After waiting for the Docker image to download, access and token information will be displayed on the command prompt as follows. Here, rewrite the URL http://127.0.0.1:8888 to http://127.0.0.1:10000 and then append the token to the end of this URL. In this example, the output will look like this:

Displayed on screen: http://127.0.0.1:8888/lab?token=6e302b2c99d56f1562e082091f4a3236051fb0a4135e10bb

To be entered in the browser address: http://127.0.0.1:10000/lab?token=6e302b2c99d56f1562e082091f4a3236051fb0a4135e10bb

Note that this token is specific to my environment, so copying it will not work for you. You should replace it with the one actually displayed on your command prompt.

Then, after waiting for a short while, JupyterLab will launch (Figure 1). From here, you can start a Notebook, access Python’s console environment, or utilize other work environments.

Figure 1. The page after entering the JupyterLab token. The left side is a file list, and the right side allows you to open Notebook creation, Python console, etc.

The port 10000 on the host side is mapped to port 8888 inside the container, as shown in Figure 2.

Figure 2. The host port 10000 is mapped to port 8888 inside the container.

In the Password or token input form on the screen, enter the token displayed in the command line or in the container logs (the string following token=), and select Log in, as shown in Figure 3.

Figure 3. Enter the token that appears in the container logs.

By the way, in this environment, the data will be erased when the container is stopped. If you want to reuse your data even after stopping the container, create a volume by adding the -v option when launching the Docker container.

To stop this container environment, click CTRL-C on the command prompt, then respond to the Jupyter server’s prompt Shutdown this Jupyter server (y/[n])? with y and press enter. If you are using Docker Desktop, stop the target container from the Containers.

Shutdown this Jupyter server (y/[n])? y [C 2023-06-26 01:39:52.997 ServerApp] Shutdown confirmed [I 2023-06-26 01:39:52.998 ServerApp] Shutting down 5 extensions [I 2023-06-26 01:39:52.998 ServerApp] Shutting down 1 kernel [I 2023-06-26 01:39:52.998 ServerApp] Kernel shutdown: 653f7c27-03ff-4604-a06c-2cb4630c098d

Once the display changes as follows, the container is terminated and the data is deleted.

When the container is running, data is saved in the /home/jovyan/work/ directory inside the container. You can either bind mount this as a volume or allocate it as a volume when starting the container. By doing so, even if you stop the container, you can use the same data again when you restart the container:

docker container run -it -p 10000:8888 \ -v “%cd%”:/home/jovyan/work \ jupyter/base-notebook

Note: The \ symbol signifies that the command line continues on the command prompt. You may also write the command in a single line without using the \ symbol. However, in the case of Windows command prompt, you need to use the ^ symbol instead.

With this setup, when launched, the JupyterLab container mounts the /work/ directory to the folder where the docker container run command was executed. Because the data persists even when the container is stopped, you can continue using your Notebook data as it is when you start the container again.

Plotting using the famous Iris flower dataset

In the following example, we’ll use the Iris flower dataset, which consists of 150 records in total, with 50 samples from each of three types of Iris flowers (Iris setosa, Iris virginica, Iris versicolor). Each record consists of four numerical attributes (sepal length, sepal width, petal length, petal width) and one categorical attribute (type of iris). This data is included in the Python library scikit-learn, and we will use matplotlib to plot this data.

When trying to input the sample code from the scikit-learn page (the code is at the bottom of the page, and you can copy and paste it) into iPython, the following error occurs (Figure 4).

Figure 4. Error message occurred due to missing “matplotlib” module.

This is an error message on iPython stating that the “matplotlib” module does not exist. Additionally, the “scikit-learn” module is needed.

To avoid these errors and enable plotting, run the following command. Here, !pip signifies running the pip command within the iPython environment:

!pip install matplotlib scikit-learn

By pasting and executing the earlier sample code in the next cell on iPython, you can plot and display the Iris dataset as shown in Figure 5.

Figure 5. When the sample code runs successfully, two images will be output.

Note that it can be cumbersome to use the !pip command to add modules every time. Fortunately, you can add also add modules in the following ways:

By creating a dedicated Dockerfile

By using an existing group of images called Jupyter Docker Stacks

Building a Docker image

If you’re familiar with Dockerfile and building images, this five-step method is easy. Also, this approach can help keep the Docker image size in check.

Step 1. Creating a directory

To build a Docker image, the first step is to create and navigate to the directory where you’ll place your Dockerfile and context:

mkdir myjupyter && cd myjupyter

Step 2. Creating a requirements.txt file

Create a requirements.txt file and list the Python modules you want to add with the pip command:

matplotlib scikit-learn

Step 3. Writing a Dockerfile

FROM jupyter/base-notebook COPY ./requirements.txt /home/jovyan/work RUN python -m pip install --no-cache -r requirements.txt

This Dockerfile specifies a base image jupyter/base-notebook, copies the requirements.txt file from the local directory to the /home/jovyan/work directory inside the container, and then runs a pip install command to install the Python packages listed in the requirements.txt file.

Step 4. Building the Docker image

docker image build -t myjupyter

Step 5. Launching the container

docker container run -it -p 10000:8888 \ -v “%cd%”:/home/jovyan/work \ myjupyter

Here’s what each part of this command does:

The docker run command instructs Docker to run a container.

The -it option attaches an interactive terminal to the container.

The -p 10000:8888 maps port 10000 on the host machine to port 8888 inside the container. This allows you to access Jupyter Notebook running in the container via http://localhost:10000 in your web browser.

The -v "%cd%":/home/jovyan/work mounts the current directory (%cd%) on the host machine to the /home/jovyan/work directory inside the container. This enables sharing files between the host and the Jupyter Notebook.

In this example, myjupyter is the name of the Docker image you want to run. Make sure you have the appropriate image available on your system. The operation after startup is the same as before. You don’t need to add libraries with the !pip command because the necessary libraries are included from the start.

How to use Jupyter Docker Stacks’ images

To execute the JupyterLab environment, we will utilize a Docker image called jupyter/scipy-notebook from the Jupyter Docker Stacks. Please note that the running Notebook will be terminated. After entering Ctrl-C on the command prompt, enter y and specify the running container.

Then, enter the following to run a new container:

docker container run -it -p 10000:8888 \ -v “%cd%”:/home/jovyan/work \ jupyter/scipy-notebook

This command will run a container using the jupyter/scipy-notebook image, which provides a Jupyter Notebook environment with additional scientific libraries.

Here’s a breakdown of the command:

The docker run command starts a new container.

The -it option attaches an interactive terminal to the container.

The -p 10000:8888 maps port 10000 on the host machine to port 8888 inside the container, allowing access to Jupyter Notebook at http://localhost:10000.

The -v "$(pwd)":/home/jovyan/work mounts the current directory ($(pwd)) on the host machine to the /home/jovyan/work directory inside the container. This enables sharing files between the host and the Jupyter Notebook.

The jupyter/scipy-notebook is the name of the Docker image used for the container. Make sure you have this image available on your system.

The previous JupyterLab image was a minimal Notebook environment. The image we are using this time includes many packages used in the scientific field, such as numpy and pandas, so it may take some time to download the Docker image. This one is close to 4GB in image size.

Once the container is running, you should be able to run the Iris dataset sample immediately without having to execute pip like before. Give it a try.

Some images include TensorFlow’s deep learning library, ones for the R language, Julia programming language, and Apache Spark. See the image list page for details.

In a Windows environment, you can easily run and evaluate the new version of JupyterLab 4.0 using Docker Desktop. Doing so will not affect or conflict with the existing Python language environment. Furthermore, this setup provides a consistent user experience across other platforms, such as macOS and Linux, making it the ideal solution for those who want to try it.

Conclusion

By containerizing JupyterLab with Docker, AI/ML developers gain numerous advantages, including consistency, easy sharing and collaboration, and scalability. It enables efficient management of AI/ML development workflows, making it easier to experiment, collaborate, and reproduce results across different environments. With JupyterLab 4.0 and Docker, the possibilities for supercharging your AI/ML development are limitless. So why wait? Embrace containerization and experience the true power of JupyterLab in your AI/ML projects.

References

JupyterLab documentation

Docker Stacks documentation

Learn more

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.