Python – Docker https://www.docker.com Mon, 22 Apr 2024 16:52:58 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.3 https://www.docker.com/wp-content/uploads/2024/02/cropped-docker-logo-favicon-32x32.png Python – Docker https://www.docker.com 32 32 Getting Started with JupyterLab as a Docker Extension https://www.docker.com/blog/getting-started-with-jupyterlab-as-a-docker-extension/ Thu, 12 Oct 2023 14:20:42 +0000 https://www.docker.com/?p=46980 This post was written in collaboration with Marcelo Ochoa, the author of the Jupyter Notebook Docker Extension.

JupyterLab is a web-based interactive development environment (IDE) that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. It is the latest evolution of the popular Jupyter Notebook and offers several advantages over its predecessor, including:

  • A more flexible and extensible user interface: JupyterLab allows users to configure and arrange their workspace to best suit their needs. It also supports a growing ecosystem of extensions that can be used to add new features and functionality.
  • Support for multiple programming languages: JupyterLab is not just for Python anymore! It can now be used to run code in various programming languages, including R, Julia, and JavaScript.
  • A more powerful editor: JupyterLab’s built-in editor includes features such as code completion, syntax highlighting, and debugging, which make it easier to write and edit code.
  • Support for collaboration: JupyterLab makes collaborating with others on projects easy. Documents can be shared and edited in real-time, and users can chat with each other while they work.

This article provides an overview of the JupyterLab architecture and shows how to get started using JupyterLab as a Docker extension.

Illustration showing Docker and Jupyter logos on dark blue background

Uses for JupyterLab

JupyterLab is used by a wide range of people, including data scientists, scientific computing researchers, computational journalists, and machine learning engineers. It is a powerful interactive computing and data science tool and is becoming increasingly popular as an IDE.

Here are specific examples of how JupyterLab can be used:

  • Data science: JupyterLab can explore data, build and train machine learning models, and create visualizations.
  • Scientific computing: JupyterLab can perform numerical simulations, solve differential equations, and analyze data.
  • Computational journalism: JupyterLab can scrape data from the web, clean and prepare data for analysis, and create interactive data visualizations.
  • Machine learning: JupyterLab can develop and train machine learning models, evaluate model performance, and deploy models to production.

JupyterLab can help solve problems in the following ways:

  • JupyterLab provides a unified environment for developing and running code, exploring data, and creating visualizations. This can save users time and effort; they do not have to switch between different tools for different tasks.
  • JupyterLab makes it easy to share and collaborate on projects. Documents can be shared and edited in real-time, and users can chat with each other while they work. This can be helpful for teams working on complex projects.
  • JupyterLab is extensible. This means users can add new features and functionality to the environment using extensions, making JupyterLab a flexible tool that can be used for a wide range of tasks.

Project Jupyter’s tools are available for installation via the Python Package Index, the leading repository of software created for the Python programming language, but you can also get the JupyterLab environment up and running using Docker Desktop on Linux, Mac, or Windows.

Alt text: Screenshot of JupyterLab options.
Figure 1: JupyterLab is a powerful web-based IDE for data science

Architecture of JupyterLab

JupyterLab follows a client-server architecture (Figure 2) where the client, implemented in TypeScript and React, operates within the user’s web browser. It leverages the Webpack module bundler to package its code into a single JavaScript file and communicates with the server via WebSockets. On the other hand, the server is a Python application that utilizes the Tornado web framework to serve the client and manage various functionalities, including kernels, file management, authentication, and authorization. Kernels, responsible for executing code entered in the JupyterLab client, can be written in any programming language, although Python is commonly used.

The client and server exchange data and commands through the WebSockets protocol. The client sends requests to the server, such as code execution or notebook loading, while the server responds to these requests and returns data to the client.

Kernels are distinct processes managed by the JupyterLab server, allowing them to execute code and send results — including text, images, and plots — to the client. Moreover, JupyterLab’s flexibility and extensibility are evident through its support for extensions, enabling users to introduce new features and functionalities, such as custom kernels, file viewers, and editor plugins, to enhance their JupyterLab experience.

Illustration of JupyterLab architecture showing connections between Extensions, Applications, API, servers, widgets, kernels, and Xeus framework.
Figure 2: JupyterLab architecture.

JupyterLab is highly extensible. Extensions can be used to add new features and functionality to the client and server. For example, extensions can be used to add new kernels, new file viewers, and new editor plugins.

Examples of JupyterLab extensions include:

  • The ipywidgets extension adds support for interactive widgets to JupyterLab notebooks.
  • The nbextensions package provides a collection of extensions for the JupyterLab notebook.
  • The jupyterlab-server package provides extensions for the JupyterLab server.

JupyterLab’s extensible architecture makes it a powerful tool that can be used to create custom development environments tailored to users’ specific needs.

Why run JupyterLab as a Docker extension?

Running JupyterLab as a Docker extension offers a streamlined experience to users already familiar with Docker Desktop, simplifying the deployment and management of the JupyterLab notebook.

Docker provides an ideal environment to bundle, ship, and run JupyterLab in a lightweight, isolated setup. This encapsulation promotes consistent performance across different systems and simplifies the setup process.

Moreover, Docker Desktop is the only prerequisite to running JupyterLabs as an extension. Once you have Docker installed, you can easily set up and start using JupyterLab, eliminating the need for additional software installations or complex configuration steps.

Getting started

Getting started with the Docker Desktop Extension is a straightforward process that allows developers to leverage the benefits of unified development. The extension can easily be integrated into existing workflows, offering a familiar interface within Docker. This seamless integration streamlines the setup process, allowing developers to dive into their projects without extensive configuration.

The following key components are essential to completing this walkthrough:

Working with JupyterLabs as a Docker extension begins with opening the Docker Desktop. Here are the steps to follow (Figure 3):

  • Choose Extensions in the left sidebar.
  • Switch to the Browse tab.
  • In the Categories drop-down, select Utility Tools.
  • Find Jupyter Notebook and then select Install.
Screenshot with labeled steps for installing JupyterLab with Docker Desktop
Figure 3: Installing JupyterLab with the Docker Desktop.

A JupyterLab welcome page will be shown (Figure 4).

Screenshot showing JupyterLab welcome page offering Notebook, console, and other options.
Figure 4: JupyterLab welcome page.

Adding extra kernels

If you need to work with other languages rather than Python3 (default), you can complete a post-installation step. For example, to add the iJava kernel, launch a terminal and execute the following:

~ % docker exec -ti --user root jupyter_embedded_dd_vm /bin/sh -c "curl -s https://raw.githubusercontent.com/marcelo-ochoa/jupyter-docker-extension/main/addJava.sh | bash"

Figure 5 shows the install process output of the iJava kernel package.

Screen capture showing progress of iJava kernel installation.
Figure 5: Capture of iJava kernel installation process.

Next, close your extension tab or Docker Desktop, then reopen, and the new kernel and language support will be enabled (Figure 6).

Screenshot of JupyterLab with support for new kernel enabled.
Figure 6: New kernel and language support enabled.

Getting started with JupyterLab

You can begin using JupyterLab notebooks in many ways; for example, you can choose the language at the welcome page and start testing your code. Or, you can upload a file to the extension using the up arrow icon found at the upper left (Figure 7).

Screenshot of sample iPython notebook.
Figure 7: Sample JupyterLab iPython notebook.

Import a new notebook from local storage (Figures 8 and 9).

Screenshot of the upload dialog box listing files.
Figure 8: Upload dialog from disk.
Screenshot showing SymPy example of uploaded notebook.
Figure 9: Uploaded notebook.

Loading JupyterLab notebook from URL

If you want to import a notebook directly from the internet, you can use the File > Open URL option (Figure 10). This page shows an example for the notebook with Java samples.

Screenshot showing "Open URL" dialog box
Figure 10: Load notebook from URL.

A notebook upload from URL result is shown in Figure 11.

Screenshot showing sample chart from uploaded notebook.
Figure 11: Uploaded notebook from URL.

Download a notebook to your personal folder

Just like uploading a notebook, the download operation is straightforward. Select your file name and choose the Download option (Figure 12).

Screenshot showing download option in the local disk option menu.
Figure 12: Download to local disk option menu.

A download destination option is also shown (Figure 13).

Screenshot of dialog box to select download destination.
Figure 13: Select local directory for downloading destination.

A note about persistent storage

The JupyterLab extension has a persistent volume for the /home/jovyan directory, which is the default directory of the JupyterLab environment. The contents of this directory will survive extension shutdown, Docker Desktop restart, and JupyterLab Extension upgrade. However, if you uninstall the extension, all this content will be discarded. Back up important data first.

Change the core image

This Docker extension uses a Docker image — jupyter/scipy-notebook:lab-4.0.6 (ubuntu 22.04) —  but you can choose one of the following available versions (Figure 14).

Illustration showing JupyterLab core image options including base-notebook, minimal-notebook, julia-notebook, tensorflow-notebook, etc.
Figure 14: JupyterLab core image options.

To change the extension image, you can follow these steps:

  1. Uninstall the extension.
  2. Install again, but do not open until the next step is done.
  3. Edit the associated docker-compose.yml file of the extension. For example, on macOS, the file can be found at: Library/Containers/com.docker.docker/Data/extensions/mochoa_jupyter-docker-extension/vm/docker-compose.yml
  4. Change the image name from jupyter/scipy-notebook:ubuntu-22.04 to jupyter/r-notebook:ubuntu-22.04.
  5. Open the extension.

On Linux, the docker-compose.yml file can be found at: .docker/desktop/extensions/mochoa_jupyter-docker-extension/vm/docker-compose.yml

Using JupyterLab with other extensions

To use the JupyterLab extension to interact with other extensions, such as the MemGraph database (Figure 15), typical examples only require a minimal change of the host connection option. This usually means a sample notebook referrer to MemGraph host running on localhost. Because JupyterLab is another extension hosted in a different Docker stack, you have to replace localhost with host.docker.internal, which refers to the external IP of another extension. Here is an example:

URI = "bolt://localhost:7687"

needs to be replaced by:

URI = "bolt://host.docker.internal:7687"
Screenshot showing MemGraph extension selected on the left panel and code in the main panel.
Figure 15: Running notebook connecting to MemGraph extension.

Conclusion

The JupyterLab Docker extension is a ready-to-run Docker stack containing Jupyter applications and interactive computing tools using a personal Jupyter server with the JupyterLab frontend.

Through the integration of Docker, setting up and using JupyterLab is remarkably straightforward, further expanding its appeal to experienced and novice users alike. 

The following video provides a good introduction with a complete walk-through of JupyterLab notebooks.

Learn more

]]>
How to Use JupyterLab nonadult
How to Develop and Deploy a Customer Churn Prediction Model Using Python, Streamlit, and Docker https://www.docker.com/blog/how-to-develop-and-deploy-a-customer-churn-prediction-model-using-python-streamlit-and-docker/ Thu, 25 Aug 2022 14:21:50 +0000 https://www.docker.com/?p=36870 image5 1

Customer churn is a million-dollar problem for businesses today. The SaaS market is becoming increasingly saturated, and customers can choose from plenty of providers. Retention and nurturing are challenging. Online businesses view customers as churn when they stop purchasing goods and services. Customer churn can depend on industry-specific factors, yet some common drivers include lack of product usage, contract tenure, and cheaper prices elsewhere.

Limiting churn strengthens your revenue streams. Businesses and marketers must predict and prevent customer churn to remain sustainable. The best way to do so is by knowing your customers. And spotting behavioral patterns in historical data can help immensely with this. So, how do we uncover them? 

Applying machine learning (ML) to customer data helps companies develop focused customer-retention programs. For example, a marketing department could use an ML churn model to identify high-risk customers and send promotional content to entice them. 

To enable these models to make predictions with new data, knowing how to package a model as a user-facing, interactive application is essential. In this blog, we’ll take an ML model from a Jupyter Notebook environment to a containerized application. We’ll use Streamlit as our application framework to build UI components and package our model. Next, we’ll use Docker to publish our model as an endpoint. 

Docker containerization helps make this application hardware-and-OS agnostic. Users can access the app from their browser through the endpoint, input customer details, and receive a churn probability in a fraction of a second. If a customer’s churn score exceeds a certain threshold, that customer may receive targeted push notifications and special offers. The diagram below puts this into perspective: 

Streamlit Docker Diagram

Why choose Streamlit?

Streamlit is an open source, Python-based framework for building UIs and powerful ML apps from a trained model. It’s popular among machine learning engineers and data scientists as it enables quick web-app development — requiring minimal Python code and a simple API. This API lets users create widgets using pure Python without worrying about backend code, routes, or requests. It provides several components that let you build charts, tables, and different figures to meet your application’s needs. Streamlit also utilizes models that you’ve saved or pickled into the app to make predictions.

Conversely, alternative frameworks like FastAPI, Flask, and Shiny require a strong grasp of HTML/CSS to build interactive, frontend apps. Streamlit is the fastest way to build and share data apps. The Streamlit API is minimal and extremely easy to understand. Minimal changes to your underlying Python script are needed to create an interactive dashboard.

Getting Started

git clone https://github.com/dockersamples/customer-churnapp-streamlit

Key Components

  • An IDE or text editor 
  • Python 3.6+ 
  • PIP (or Anaconda)
  • Not required but recommended: An environment-management tool such as pipenv, venv, virtualenv, or conda
  • Docker Desktop

Before starting, install Python 3.6+. Afterwards, follow these steps to install all libraries required to run the model on your system. 

Our project directory structure should look like this:

$ tree
.
├── Churn_EDA_model_development.ipynb
├── Churn_model_metrics.ipynb
├── Dockerfile
├── Pipfile
├── Pipfile.lock
├── WA_Fn-UseC_-Telco-Customer-Churn.csv
├── train.py
├── requirements.txt
├── README.md
├── images
│   ├── churndemo.gif
│   ├── icone.png
│   └── image.png
├── model_C=1.0.bin
└── stream_app.py

Install project dependencies in a virtual environment 

We’ll use the Pipenv library to create a virtual Python environment and install the dependencies required to run Streamlit. The Pipenv tool automatically manages project packages through the Pipfile as you install or uninstall them. It also generates a Pipfile.lock file, which helps produce deterministic builds and creates a snapshot of your working environment. Follow these steps to get started.

1) Enter your project directory

cd customer-churnapp-streamlit

2) Install Pipenv

pip install pipenv

3) Install the dependencies

pipenv install

4) Enter the pipenv virtual environment

pipenv shell

After completing these steps, you can run scripts from your virtual environment! 

Building a simple machine-learning model

Machine learning uses algorithms and statistical models. These analyze historical data and make inferences from patterns without any explicit programming. Ultimately, the goal is to predict outcomes based on incoming data. 

In our case, we’re creating a model from historical customer data to predict which customers are likely to leave. Since we need to classify customers as either churn or no-churn, we’ll train a simple-yet-powerful classification model. Our model uses logistic regression on a telecom company’s historical customer dataset. This set tracks customer demographics, tenure, monthly charges, and more. However, one key question is also answered: did the customer churn? 

Logistic regression estimates an event’s probability based on a given dataset of independent variables. Since the outcome is a probability, the dependent variable is bounded between 0 and 1. The model will undergo multiple iterations and calculate best-fit coefficients for each variable. This quantifies just how much each impacts churn. With these coefficients, the model can assign churn likelihood scores between 0 and 1 to new customers. Someone who scores a 1 is extremely likely to churn. Someone with a 0 is incredibly unlikely to churn. 

Python has great libraries like Pandas, NumPy, and Matplotlib that support data analytics. Open-source frameworks like Scikit Learn have pre-built wrappers for various ML models. We’ll use their API to train a logistic-regression model. To understand how this basic churn prediction model was born, refer to Churn_EDA_model_development.ipynb. ML models require many attempts to get right. Therefore, we recommend using a Jupyter notebook or an IDE. 

In a nutshell we performed the below steps to create our churn prediction model:

  1. Initial data preparation 
    • Perform sanity checks on data types and column names 
    • Make data type corrections if needed 
  2. Data and feature understanding 
    • Check the distribution of numerical features
    • Check the distinct values of categorical features 
    • Check the target feature distribution 
  3. Exploratory data analysis 
    • Handle missing values 
    • Handle outliers 
    • Understand correlations and identify spurious ones 
  4. Feature engineering and importance 
    • Analyze churn rate and risk scores across different cohorts and feature groups 
    • Calculate mutual information 
    • Check feature correlations 
  5. Encoding categorical features and scaling numerical features 
    • Convert categorical features into numerical values using Scikit-Learn’s helper function: Dictionary Vectoriser 
    • Scale numerical features to standardize them into a fixed range 
  6. Model training 
    • Select an appropriate ML algorithm 
    • Train the model with custom parameters 
  7. Model evaluation 
    • Refer to Churn_model_metrics.ipynb 
    • Use different metrics to evaluate the model like accuracy, confusion table, precision, recall, ROC curves, AUROC, and cross-validation.
  8. Repeat steps 6 and 7 for different algorithms and model hyperparameters, then select the best-fit model.

It’s best practice to automate the training process using a Python script. Each time we choose to retrain the model with a new parameter or a new dataset, we can execute this script and save the resulting model. 

Check out train.py to explore how to package a model into a script that automates model training! 

Once we uncover the best-fit model, we must save it to reuse it later without running any of the above training code scripts. Let’s get started.

Save the model

In machine learning, we save trained models in a file and restore them to compare each with other models. We can also test them using new data. The save process is called Serialization, while restoration is called Deserialization.

We use a helper Python library called Pickle to save the model. The Pickle module implements a fundamental, yet powerful, algorithm for serializing and de-serializing a Python object structure. 

You can also use the following functions: 

  • pickle.dump serializes an object hierarchy using dump().
  • pickle.load deserializes a data stream via the load() function.

We’ve chosen Pickle since it supports models created using the Scikit-Learn framework and offers great loading performance. Similar training frameworks like Tensorflow and Keras have their own built-in libraries for saving models, which are designed to perform well with their architectures. 

Dump the Model and Dictionary Vectorizer

import pickle

with open('model_C=1.0.bin', 'wb') as f_out
   pickle.dump((dict_vectorizer, model), f_out)
   f_out.close() ## After opening any file it's necessary to close it

We just saved a binary file named model_C=1.0.bin and wrote the dict_vectorizer for one Hot Encoding and included Logistic Regression Model as an array within it. 

Create a new Python file

Now, we’ll create a stream_app.py script that both defines our app layout and trigger-able backend logic. This logic activates when users interact with different UI components. Crucially, this file is reusable with any model. 

This is just an example. We strongly recommend exploring more components and design options from the Streamlit library. If you’re skilled in HTML and JavaScript, you can create your own Streamlit components that grant you more control over your app’s layout. 

First, import the required libraries:

import pickle
import streamlit as st
import pandas as pd
from PIL import Image

Next, you’ll need to load the same binary file we saved earlier to deserialize the model and dictionary vectorizer.

model_file = 'model_C=1.0.bin'

with open(model_file, 'rb') as f_in:
    dv, model = pickle.load(f_in)

The following code snippet loads the images and displays them on your screen. The st.image portion helps display an image on the frontend:

image = Image.open('images/icone.png') 
image2 = Image.open('images/image.png')
  
st.image(image,use_column_width=False)

To display items in the sidebar, you’ll need the following code snippet:

add_selectbox = st.sidebar.selectbox("How would you like to predict?",
("Online", "Batch"))

st.sidebar.info('This app is created to predict Customer Churn')
st.sidebar.image(image2)

Streamlit’s sidebar renders a vertical, collapsible bar where users can select the type of model scoring they want to perform — like batch scoring (predictions for multiple customers) or online scoring (for single customers). We also add text and images to decorate the sidebar. 

The following code helps you display the main title:

st.title("Predicting Customer Churn")

You can display input widgets to collect customer details and generate predictions, when the user selects the ‘Online’ option:

if add_selectbox == 'Online':
  
		gender = st.selectbox('Gender:', ['male', 'female'])
		seniorcitizen= st.selectbox(' Customer is a senior citizen:', [0, 1])
		partner= st.selectbox(' Customer has a partner:', ['yes', 'no'])
		dependents = st.selectbox(' Customer has  dependents:', ['yes', 'no'])
		phoneservice = st.selectbox(' Customer has phoneservice:', ['yes', 'no'])
		multiplelines = st.selectbox(' Customer has multiplelines:', ['yes', 'no', 'no_phone_service'])
		internetservice= st.selectbox(' Customer has internetservice:', ['dsl', 'no', 'fiber_optic'])
		onlinesecurity= st.selectbox(' Customer has onlinesecurity:', ['yes', 'no', 'no_internet_service'])
		onlinebackup = st.selectbox(' Customer has onlinebackup:', ['yes', 'no', 'no_internet_service'])
		deviceprotection = st.selectbox(' Customer has deviceprotection:', ['yes', 'no', 'no_internet_service'])
		techsupport = st.selectbox(' Customer has techsupport:', ['yes', 'no', 'no_internet_service'])
		streamingtv = st.selectbox(' Customer has streamingtv:', ['yes', 'no', 'no_internet_service'])
		streamingmovies = st.selectbox(' Customer has streamingmovies:', ['yes', 'no', 'no_internet_service'])
		contract= st.selectbox(' Customer has a contract:', ['month-to-month', 'one_year', 'two_year'])
		paperlessbilling = st.selectbox(' Customer has a paperlessbilling:', ['yes', 'no'])
		paymentmethod= st.selectbox('Payment Option:', ['bank_transfer_(automatic)', 'credit_card_(automatic)', 'electronic_check' ,'mailed_check'])
		tenure = st.number_input('Number of months the customer has been with the current telco provider :', min_value=0, max_value=240, value=0)
		monthlycharges= st.number_input('Monthly charges :', min_value=0, max_value=240, value=0)
		totalcharges = tenure*monthlycharges
		output= ""
		output_prob = ""
		input_dict={
				"gender":gender ,
				"seniorcitizen": seniorcitizen,
				"partner": partner,
				"dependents": dependents,
				"phoneservice": phoneservice,
				"multiplelines": multiplelines,
				"internetservice": internetservice,
				"onlinesecurity": onlinesecurity,
				"onlinebackup": onlinebackup,
				"deviceprotection": deviceprotection,
				"techsupport": techsupport,
				"streamingtv": streamingtv,
				"streamingmovies": streamingmovies,
				"contract": contract,
				"paperlessbilling": paperlessbilling,
				"paymentmethod": paymentmethod,
				"tenure": tenure,
				"monthlycharges": monthlycharges,
				"totalcharges": totalcharges
			}
          
		if st.button("Predict"):
			X = dv.transform([input_dict])
			y_pred = model.predict_proba(X)[0, 1]
			churn = y_pred >= 0.5
			output_prob = float(y_pred)
			output = bool(churn)
		st.success('Churn: {0}, Risk Score: {1}'.format(output, output_prob))

Your app’s frontend leverages Streamlit’s input widgets like select box, slider, and number input. Users interact with these widgets by entering values. Input data is then packaged into a Python dictionary. The backend — which handles the prediction score computation logic — is defined inside the st.button layer and awaits the user trigger. When this happens, the dictionary is passed to the dictionary vectorizer which performs encoding for categorical features and makes it consumable for the model. 

Streamlit passes any transformed inputs to the model and calculates the churn prediction score. Using the threshold of 0.5, the churn score is converted into a binary class. The risk score and churn class are returned to the frontend via Streamlit’s success component. This displays a success message. 

To display the file upload button when the user selects “Batch” from the sidebar, the following code snippet might be useful:

if add_selectbox == 'Batch':
		file_upload = st.file_uploader("Upload csv file for predictions", type=["csv"])
		if file_upload is not None:
			data = pd.read_csv(file_upload)
			X = dv.transform([data])
			y_pred = model.predict_proba(X)[0, 1]
			churn = y_pred >= 0.5
			churn = bool(churn)
			st.write(churn)

When the user wants to batch score customers, the page layout will dynamically change to match this selection. Streamlit’s file uploader component will display a related widget. This prompts the user to upload a CSV file, which is then read using the panda library and processed by the dictionary vectorizer and model. It displays prediction scores on the frontend using st.write

The above application skeleton is wrapped within a main function in the below script. Running the script invokes the main function. Here’s how that final script looks:

import pickle
import streamlit as st
import pandas as pd
from PIL import Image
model_file = 'model_C=1.0.bin'


with open(model_file, 'rb') as f_in:
    dv, model = pickle.load(f_in)


def main():
	image = Image.open('images/icone.png') 
	image2 = Image.open('images/image.png')
	st.image(image,use_column_width=False) 
	add_selectbox = st.sidebar.selectbox(
	"How would you like to predict?",
	("Online", "Batch"))
	st.sidebar.info('This app is created to predict Customer Churn')
	st.sidebar.image(image2)
	st.title("Predicting Customer Churn")
	if add_selectbox == 'Online':
		gender = st.selectbox('Gender:', ['male', 'female'])
		seniorcitizen= st.selectbox(' Customer is a senior citizen:', [0, 1])
		partner= st.selectbox(' Customer has a partner:', ['yes', 'no'])
		dependents = st.selectbox(' Customer has  dependents:', ['yes', 'no'])
		phoneservice = st.selectbox(' Customer has phoneservice:', ['yes', 'no'])
		multiplelines = st.selectbox(' Customer has multiplelines:', ['yes', 'no', 'no_phone_service'])
		internetservice= st.selectbox(' Customer has internetservice:', ['dsl', 'no', 'fiber_optic'])
		onlinesecurity= st.selectbox(' Customer has onlinesecurity:', ['yes', 'no', 'no_internet_service'])
		onlinebackup = st.selectbox(' Customer has onlinebackup:', ['yes', 'no', 'no_internet_service'])
		deviceprotection = st.selectbox(' Customer has deviceprotection:', ['yes', 'no', 'no_internet_service'])
		techsupport = st.selectbox(' Customer has techsupport:', ['yes', 'no', 'no_internet_service'])
		streamingtv = st.selectbox(' Customer has streamingtv:', ['yes', 'no', 'no_internet_service'])
		streamingmovies = st.selectbox(' Customer has streamingmovies:', ['yes', 'no', 'no_internet_service'])
		contract= st.selectbox(' Customer has a contract:', ['month-to-month', 'one_year', 'two_year'])
		paperlessbilling = st.selectbox(' Customer has a paperlessbilling:', ['yes', 'no'])
		paymentmethod= st.selectbox('Payment Option:', ['bank_transfer_(automatic)', 'credit_card_(automatic)', 'electronic_check' ,'mailed_check'])
		tenure = st.number_input('Number of months the customer has been with the current telco provider :', min_value=0, max_value=240, value=0)
		monthlycharges= st.number_input('Monthly charges :', min_value=0, max_value=240, value=0)
		totalcharges = tenure*monthlycharges
		output= ""
		output_prob = ""
		input_dict={
				"gender":gender ,
				"seniorcitizen": seniorcitizen,
				"partner": partner,
				"dependents": dependents,
				"phoneservice": phoneservice,
				"multiplelines": multiplelines,
				"internetservice": internetservice,
				"onlinesecurity": onlinesecurity,
				"onlinebackup": onlinebackup,
				"deviceprotection": deviceprotection,
				"techsupport": techsupport,
				"streamingtv": streamingtv,
				"streamingmovies": streamingmovies,
				"contract": contract,
				"paperlessbilling": paperlessbilling,
				"paymentmethod": paymentmethod,
				"tenure": tenure,
				"monthlycharges": monthlycharges,
				"totalcharges": totalcharges
			}
		if st.button("Predict"):
           
			X = dv.transform([input_dict])
			y_pred = model.predict_proba(X)[0, 1]
			churn = y_pred >= 0.5
			output_prob = float(y_pred)
			output = bool(churn)
 
		st.success('Churn: {0}, Risk Score: {1}'.format(output, output_prob))

	if add_selectbox == 'Batch':

		file_upload = st.file_uploader("Upload csv file for predictions", type=["csv"])
		if file_upload is not None:
			data = pd.read_csv(file_upload)
			X = dv.transform([data])
			y_pred = model.predict_proba(X)[0, 1]
			churn = y_pred >= 0.5
			churn = bool(churn)
			st.write(churn)


if __name__ == '__main__':
	main()

You can download the complete script from our Dockersamples GitHub page.

Execute the script

streamlit run stream_app.py

View your Streamlit app

You can now view your Streamlit app in your browser. Navigate to the following:

image3 1

Containerizing the Streamlit app with Docker

Let’s explore how to easily run this app within a Docker container, using a Docker Official image. First, you’ll need to download Docker Desktop. Docker Desktop accelerates the image-building process while making useful images more discoverable. Complete this installation process once your download is finished.

Docker uses a Dockerfile to specify each image’s “layers.” Each layer stores important changes stemming from the base image’s standard configuration. Create an empty Dockerfile in your Streamlit project:

touch Dockerfile

Next, use your favorite text editor to open this Dockerfile. We’re going to build out this new file piece by piece. To start, let’s define a base image:

FROM python:3.8.12-slim

It’s now time to ensure that the latest pip modules are installed:

RUN /usr/local/bin/python -m pip install --upgrade pip

Next, let’s quickly create a directory to house our image’s application code. This is the working directory for your application:

WORKDIR /app

The following COPY instruction copies the requirements file from the host machine to the container image:

COPY requirements.txt ./requirements.txt
RUN pip install -r requirements.txt

The EXPOSE instruction tells Docker that your container is listening on the specified network ports at runtime:

EXPOSE 8501

Finally, create an ENTRYPOINT to make your image executable:

ENTRYPOINT ["streamlit", "run"]
CMD ["stream_app.py"]

After assembling each piece, here’s your complete Dockerfile:

FROM python:3.8.12-slim
RUN /usr/local/bin/python -m pip install --upgrade pip
WORKDIR /app
COPY requirements.txt ./requirements.txt
RUN pip install -r requirements.txt
EXPOSE 8501
COPY . .
ENTRYPOINT ["streamlit", "run"]
CMD ["stream_app.py"]

Build your image

docker build -t customer_churn .

Run the app

docker run -d -p 8501:8501 customer_churn

View the app within Docker Desktop

You can do this by navigating to the Containers interface, which lists your running application as a named container:

image1 2

Access the app

First, select your app container in the list. This opens the Logs view. Click the button with a square icon (with a slanted arrow) located next to the Stats pane. This opens your app in your browser:

image2 1
image3 2

Alternatively, you can hover over your container in the list and click that icon once the righthand toolbar appears.

Develop and deploy your next machine learning model, today

Congratulations! You’ve successfully explored how to build and deploy customer churn prediction models using Streamlit and Docker. With a single Dockerfile, we’ve demonstrated how easily you can build an interactive frontend and deploy this application in seconds. 

With just a few extra steps, you can use this tutorial to build applications with much greater complexity. You can make your app more useful by implementing push-notification logic in the app — letting the marketing team send promotional emails to high-churn customers on the fly. Happy coding.

]]>
Resources to Use Javascript, Python, Java, and Go with Docker https://www.docker.com/blog/resources-to-use-javascript-python-java-and-go-with-docker/ Fri, 08 Jul 2022 14:00:05 +0000 https://www.docker.com/?p=34686 With so many programming and scripting languages out there, developers can tackle development projects any number of ways. However, some languages — like JavaScript, Python, and Java — have been perennial favorites. (We’ve previously touched on this while unpacking Stack Overflow’s 2022 Developer Survey results.)

Programming Language Syntax

Image courtesy of Joan Gamell, via Unsplash

Many developers use Docker in tandem with these languages. We’ve seen our users create some amazing applications! Here are some resources and recommendations to level up your container game with these languages.

Getting Started with Docker

If you’ve never used Docker, you may want to familiarize yourself with some basic concepts first. You can learn the technical fundamentals of Docker and containerization via our “Orientation and Setup” guide and our introductory page. You’ll learn how containers work, and even how to harness tools like the Docker CLI or Docker Desktop.

Our Orientation page also serves as a foundation for many of our own official walkthroughs. This is a great resource if you’re completely new to Docker!

If you prefer hands-on learning, look no further than Shy Ruparel’s “Getting Started with Docker” video guide. Shy will introduce you to Docker’s architecture, essential CLI commands, Docker Desktop tips, and sample applications.

If you’re feeling comfortable with Docker, feel free to jump to your language-specific section using the links below. We’ve created language-specific workflows for each top language within our documentation (AKA “Our Language Modules” in this blog). These steps are linked below alongside some extra exploratory resources. We’ll also include some awesome-compose code samples to accelerate similar development projects — or to serve as inspiration.

Table of Contents

How to Use Docker with JavaScript

JavaScript has been the programming world’s leading language for 10 years running. Luckily, there are also many ways to use JavaScript and Docker together. Check out these resources to harness JavaScript, Node.js, and other runtimes or frameworks with Docker.

Docker Node.js Modules

Before exploring further, it’s worth completing our learning modules for Node. These take you through the basics and set you up for increasingly-complex projects later on. We recommend completing these in order:

  1. Overview for Node.js (covering learning objectives and containerization of your Node application)
  2. Build your Node image
  3. Run your image as a container
  4. Use containers for development
  5. Run your tests using Node.js and Mocha frameworks
  6. Configure CI/CD for your application
  7. Deploy your app

It’s also possible that you’ll want to explore more processes for building minimum viable products (MVPs) or pulling container images. You can read more by visiting the following links.

Other Essential Node Resources

How to Use Docker with Python

Python has consistently been one of our developer community’s favorite languages. From building simple sample apps to leveraging machine learning frameworks, the language supports a variety of workloads. You can learn more about the dynamic duo of Python and Docker via these links.

Docker Python Modules

Similar to Node.js, these pages from our documentation are a great starting point for harnessing Python and Docker:

  1. Overview for Python
  2. Build your Python image
  3. Run your image as a container
  4. Use containers for development (featuring Python and MySQL)
  5. Configure CI/CD for your application
  6. Deploy your app

Other Essential Python Resources

How to Use Docker with Java

Both its maturity and the popularity of Spring Boot have contributed to Java’s growth over the years. It’s easy to pair Java with Docker! Here are some resources to help you do it.

Docker Java Modules

Like with Python, these modules can help you hit the ground running with Java and Docker:

  1. Overview for Java
  2. Build your Java image
  3. Run your image as a container
  4. Use containers for development
  5. Run your tests
  6. Configure CI/CD for your application
  7. Deploy your app

Other Essential Java Resources

How to Use Docker with Go

Last, but not least, Go has become a popular language for Docker users. According to Stack Overflow’s 2022 Developer Survey, over 10,000 JavaScript users (of roughly 46,000) want to start or continue developing in Go or Rust. It’s often positioned as an alternative to C++, yet many Go users originally transition over from Python and Ruby.

There’s tremendous overlap there. Go’s ecosystem is growing, and it’s become increasingly useful for scaling workloads. Check out these links to jumpstart your Go and Docker development.

Docker Go Modules

  1. Overview for Go
  2. Build your Go image
  3. Run your image as a container
  4. Use containers for development
  5. Run your tests using Go test
  6. Configure CI/CD for your application
  7. Deploy your app

Other Essential Go Resources

Build in the Language You Want with Docker

Docker supports all of today’s leading languages. It’s easy to containerize your application and deploy cross-platform without having to make concessions. You can bring your workflows, your workloads, and, ultimately, your users along.

And that’s just the tip of the iceberg. We welcome developers who develop in other languages like Rust, TypeScript, C#, and many more. Docker images make it easy to create these applications from scratch.

We hope these resources have helped you discover and explore how Docker works with your preferred language. Visit our language-specific guides page to learn key best practices and image management tips for using these languages with Docker Desktop.

]]>
Getting Started with Docker nonadult
How to Train and Deploy a Linear Regression Model Using PyTorch https://www.docker.com/blog/how-to-train-and-deploy-a-linear-regression-model-using-pytorch-part-1/ Thu, 16 Jun 2022 14:00:29 +0000 https://www.docker.com/?p=34094 Python is one of today’s most popular programming languages and is used in many different applications. The 2021 StackOverflow Developer Survey showed that Python remains the third most popular programming language among developers. In GitHub’s 2021 State of the Octoverse report, Python took the silver medal behind Javascript.

Thanks to its longstanding popularity, developers have built many popular Python frameworks and libraries like Flask, Django, and FastAPI for web development.

However, Python isn’t just for web development. It powers libraries and frameworks like NumPy (Numerical Python), Matplotlib, scikit-learn, PyTorch, and others which are pivotal in engineering and machine learning. Python is arguably the top language for AI, machine learning, and data science development. For deep learning (DL), leading frameworks like TensorFlow, PyTorch, and Keras are Python-friendly.

We’ll introduce PyTorch and how to use it for a simple problem like linear regression. We’ll also provide a simple way to containerize your application. Also, keep an eye out for Part 2 — where we’ll dive deeply into a real-world problem and deployment via containers. Let’s get started.

What is PyTorch?

A Brief History and Evolution of PyTorch

Torch debuted in 2002 as a deep-learning library developed in the Lua language. Accordingly, Soumith Chintala and Adam Paszke (both from Meta) developed PyTorch in 2016 and based it on the Torch library. Since then, developers have flocked to it. PyTorch was the third-most-popular framework per the 2021 StackOverflow Developer Survey. However, it’s the most loved DL library among developers and ranks third in popularity. Pytorch is also the DL framework of choice for Tesla, Uber, Microsoft, and over 7,300 others.

PyTorch enables tensor computation with GPU acceleration, plus deep neural networks built on a tape-based autograd system. We’ll briefly break these terms down, in case you’ve just started learning about these technologies.

  • A tensor, in a machine learning context, refers to an n-dimensional array.
  • A tape-based autograd means that Pytorch uses reverse-mode automatic differentiation, which is a mathematical technique to compute derivatives (or gradients) effectively using a computer.

Since diving into these mathematics might take too much time, check out these links for more information:

PyTorch is a vast library and contains plenty of features for various deep learning applications. To get started, let’s evaluate a use case like linear regression.

What is Linear Regression?

Linear Regression is one of the most commonly used mathematical modeling techniques. It models a linear relationship between two variables. This technique helps determine correlations between two variables — or determines the value-dependent variable based on a particular value of the independent variable.

In machine learning, linear regression often applies to prediction and forecasting applications. You can solve it analytically, typically without needing any DL framework. However, this is a good way to understand the PyTorch framework and kick off some analytical problem-solving.

Numerous books and web resources address the theory of linear regression. We’ll cover just enough theory to help you implement the model. We’ll also explain some key terms. If you want to explore further, check out the useful resources at the end of this section.

Linear Regression Model

You can represent a basic linear regression model with the following equation:

Y = mX + bias

What does each portion represent?

  • Y is the dependent variable, also called a target or a label.
  • X is the independent variable, also called a feature(s) or co-variate(s).
  • bias is also called offset.
  • m refers to the weight or “slope.”

These terms are often interchangeable. The dependent and independent variables can be scalars or tensors.

The goal of the linear regression is to choose weights and biases so that any prediction for a new data point — based on the existing dataset — yields the lowest error rate. In simpler terms, linear regression is finding the best possible curve (line, in this case) to match your data distribution.

Loss Function

A loss function is an error function that expresses the error (or loss) between real and predicted values. A very popular way to measure loss is by using a root mean squared error, which we’ll also use.

Gradient Descent Algorithms

Gradient descent is a class of optimization algorithms that tries to solve the problem (either analytically or using deep learning models) by starting from an initial guess of weights and bias. It then iteratively reduces errors by updating weights and bias values with successively better guesses.

A simplified approach uses the derivative of the loss function and minimizes the loss. The derivative is the slope of the mathematical curve, and we’re attempting to reach the bottom of it — hence the name gradient descent. The stochastic gradient method samples smaller batches of data to compute updates which are computationally better than passing the entire dataset at each iteration.

To learn more about this theory, the following resources are helpful:

Linear Regression with Pytorch

Now, let’s talk about implementing a linear regression model using PyTorch. The script shown in the steps below is main.py — which resides in the GitHub repository and is forked from the “Dive Into Deep learning” example repository. You can find code samples within the pytorch directory.

For our regression example, you’ll need the following:

  • Python 3
  • PyTorch module (pip install torch) installed on your system
  • NumPy module (pip install numpy) installed
  • Optionally, an editor (VS Code is used in our example)

Problem Statement

As mentioned previously, linear regression is analytically solvable. We’re using deep learning to solve this problem since it helps you quickly get started and easily check the validity of your training data. This compares your training data against the data set.

We’ll attempt the following using Python and PyTorch:

  • Creating synthetic data where we’re aware of weights and bias
  • Using the PyTorch framework and built-in functions for tensor operations, dataset loading, model definition, and training

We don’t need a validation set for this example since we already have the ground truth. We’d assess our results by measuring the error against the weights and bias values used while creating our synthetic data.

Step 1: Import Libraries and Namespaces

For our simple linear regression, we’ll import the torch library in Python. We’ll also add some specific namespaces from our torch import. This helps create cleaner code:


# Step 1 import libraries and namespaces

import torch

from torch.utils import data

# `nn` is an abbreviation for neural networks

from torch import nn

Step 2: Create a Dataset

For simplicity’s sake, this example creates a synthetic dataset that aims to form a linear relationship between two variables with some bias.

i.e. y = mx + bias + noise


#Step 2: Create Dataset

#Define a function to generate noisy data

def synthetic_data(m, c, num_examples):

"""Generate y = mX + bias(c) + noise"""

X = torch.normal(0, 1, (num_examples, len(m)))

y = torch.matmul(X, m) + c

y += torch.normal(0, 0.01, y.shape)

return X, y.reshape((-1, 1))

 

true_m = torch.tensor([2, -3.4])

true_c = 4.2

features, labels = synthetic_data(true_m, true_c, 1000)

Here, we use the built-in PyTorch function torch.normal to return a tensor of normally distributed random numbers. We’re also using the torch.matmul function to multiply tensor X with tensor m, and Y is distributed normally again.

The dataset looks like this when visualized using a simple scatter plot:

scatterplot

The code to create the visualization can be found in this GitHub repository.

Step 3: Read the Dataset and Define Small Batches of Data

#Step 3: Read dataset and create small batch

#define a function to create a data iterator. Input is the features and labels from synthetic data

# Output is iterable batched data using torch.utils.data.DataLoader

def load_array(data_arrays, batch_size, is_train=True):

"""Construct a PyTorch data iterator."""

dataset = data.TensorDataset(*data_arrays)

return data.DataLoader(dataset, batch_size, shuffle=is_train)

 

batch_size = 10

data_iter = load_array((features, labels), batch_size)

 

next(iter(data_iter))

Here, we use the PyTorch functions to read and sample the dataset. TensorDataset stores the samples and their corresponding labels, while DataLoader wraps an iterable around the TensorDataset for easier access.

The iter function creates a Python iterator, while next obtains the first item from that iterator.

Step 4: Define the Model

PyTorch offers pre-built models for different cases. For our case, a single-layer, feed-forward network with two inputs and one output layer is sufficient. The PyTorch documentation provides details about the nn.linear implementation.

The model also requires the initialization of weights and biases. In the code, we initialize the weights using a Gaussian (normal) distribution with a mean value of 0, and a standard deviation value of 0.01. The bias is simply zero.


#Step4: Define model & initialization

# Create a single layer feed-forward network with 2 inputs and 1 outputs.

net = nn.Linear(2, 1)

 

#Initialize model params

net.weight.data.normal_(0, 0.01)

net.bias.data.fill_(0)

Step 5: Define the Loss Function

The loss function is defined as a root mean squared error. The loss function tells you how far from the regression line the data points are:


#Step 5: Define loss function
# mean squared error loss function
loss = nn.MSELoss()

Step 6: Define an Optimization Algorithm

For optimization, we’ll implement a stochastic gradient descent method.
The lr stands for learning rate and determines the update step during training.


#Step 6: Define optimization algorithm
# implements a stochastic gradient descent optimization method
trainer = torch.optim.SGD(net.parameters(), lr=0.03)

Step 7: Training

For training, we’ll use specialized training data for n epochs (five in our case), iteratively using minibatch features and corresponding labels. For each minibatch, we’ll do the following:

  • Compute predictions and calculate the loss
  • Calculate gradients by running the backpropagation
  • Update the model parameters
  • Compute the loss after each epoch

# Step 7: Training

# Use complete training data for n epochs, iteratively using a minibatch features and corresponding label

# For each minibatch:

#   Compute predictions by calling net(X) and calculate the loss l

#   Calculate gradients by running the backpropagation

#   Update the model parameters using optimizer

#   Compute the loss after each epoch and print it to monitor progress

num_epochs = 5

for epoch in range(num_epochs):

for X, y in data_iter:

l = loss(net(X) ,y)

trainer.zero_grad() #sets gradients to zero

l.backward() # back propagation

trainer.step() # parameter update

l = loss(net(features), labels)

print(f'epoch {epoch + 1}, loss {l:f}')

Results

Finally, compute errors by comparing the true value with the trained model parameters. A low error value is desirable. You can compute the results with the following code snippet:


#Results
m = net.weight.data
print('error in estimating m:', true_m - m.reshape(true_m.shape))
c = net.bias.data
print('error in estimating c:', true_c - c)

When you run your code, the terminal window outputs the following:

python3 main.py 
features: tensor([1.4539, 1.1952]) 
label: tensor([3.0446])
epoch 1, loss 0.000298
epoch 2, loss 0.000102
epoch 3, loss 0.000101
epoch 4, loss 0.000101
epoch 5, loss 0.000101
error in estimating m: tensor([0.0004, 0.0005])
error in estimating c: tensor([0.0002])

As you can see, errors gradually shrink alongside the values.

Containerizing the Script

In the previous example, we had to install multiple Python packages just to run a simple script. Containers, meanwhile, let us easily package all dependencies into an image and run an application.

We’ll show you how to quickly and easily Dockerize your script. Part 2 of the blog will discuss containerized deployment in greater detail.

Containerize the Script

Containers help you bundle together your code, dependencies, and libraries needed to run applications in an isolated environment. Let’s tackle a simple workflow for our linear regression script.

We’ll achieve this using Docker Desktop. Docker Desktop incorporates Dockerfiles, which specify an image’s overall contents.

Make sure to pull a Python base image (version 3.10) for our example:

FROM python:3.10

Next, we’ll install the numpy and torch dependencies needed to run our code:

RUN apt update && apt install -y python3-pip
RUN pip3 install numpy torch

Afterwards, we’ll need to place our main.py script into a directory:

COPY main.py app/

Finally, the CMD instruction defines important executables. In our case, we’ll run our main.py script:

CMD ["python3", "app/main.py" ]

Our complete Dockerfile is shown below, and exists within this GitHub repo:

FROM python:3.10
RUN apt update && apt install -y python3-pip
RUN pip3 install numpy torch
COPY main.py app/
CMD ["python3", "app/main.py" ]

Build the Docker Image

Now that we have every instruction that Docker Desktop needs to build our image, we’ll follow these steps to create it:

  1. In the GitHub repository, our sample script and Dockerfile are located in a directory called pytorch. From the repo’s home folder, we can enter cd deeplearning-docker/pytorch to access the correct directory.
  2. Our Docker image is named linear_regression. To build your image, run the docker build -t linear_regression. command.

Run the Docker Image

Now that we have our image, we can run it as a container with the following command:

docker run linear_regression

This command will create a container and execute the main.py script. Once we run the container, it’ll re-print the loss and estimates. The container will automatically exit after executing these commands. You can view your container’s status via Docker Desktop’s Container interface:

containers docker desktop

Desktop shows us that linear_regression executed the commands and exited successfully.

We can view our error estimates via the terminal or directly within Docker Desktop. I used a Docker Extension called Logs Explorer to view my container’s output (shown below):

Alternatively, you may also experiment using the Docker image that we created in this blog.

logs

As we can see, the results from running the script on my system and inside the container are comparable.

To learn more about using containers with Python, visit these helpful links:

Want to learn more about PyTorch theories and examples?

We took a very tiny peek into the world of Python, PyTorch, and deep learning. However, many resources are available if you’re interested in learning more. Here are some great starting points:

Additionally, endless free and paid courses exist on websites like YouTube, Udemy, Coursera, and others.

Conclusion

In this blog, we’ve introduced PyTorch and linear regression, and we’ve used the PyTorch framework to solve a very simple linear regression problem. We’ve also shown a very simple way to containerize your PyTorch application.

]]>
PyTorch at Uber - Sidney Zhang, Uber nonadult
How to Build and Deploy a Django-based URL Shortener App from Scratch https://www.docker.com/blog/how-to-build-and-deploy-a-django-based-url-shortener-app-from-scratch/ Tue, 07 Jun 2022 15:00:48 +0000 https://www.docker.com/?p=33930 At Docker, we are incredibly proud of our vibrant, diverse and creative community. From time to time, we feature cool contributions from the community on our blog to highlight some of the great work our community does. Are you working on something awesome with Docker? Send your contributions to Ajeet Singh Raina (@ajeetraina) on the Docker Community Slack and we might feature your work!

URL shortening is a widely adopted technique that’s used to create short, condensed, and unique aliases for long URL links. Websites like tinyurl.com, bit.ly and ow.ly offer online URL shortener services, while social media sites integrate shorteners right into their product, like Twitter with its use of t.co. This is especially important for Twitter, where shortened links allow users to share long URLs in a Tweet while still fitting in the maximum number of characters for their message.

Why are URL shortener techniques so popular? First, the URL shortener technique allows you to create a short URL that is easy to remember and manage. Say, if you have a brand name, a short URL just consisting of a snippet of your company name is easier to identify and remember.

Screenshot 2022 06 06 at 8.50.51 AM

Second, oversized and hard-to-guess URLs might sometimes look too suspicious and clunky. Imagine a website URL link that has UTM parameters embedded. UTMs are snippets of text added to the end of a URL to help marketers track where website traffic comes from if users click a link to this URL. With too many letters, backslashes and question marks, a long URL might look insecure. Some users might still think that there is a security risk involved with a shortened URL as you cannot tell where you’re going to land, but there are services like Preview mode that allows you to see a preview version of long URL before it instantly redirects you to the actual site.

How do they actually work? Whenever a user clicks a link (say, https://tinyurl.com/2p92vwuh), an HTTP request is sent to the backend server with the full URL. The backend server reads the path part(2p92vwuh) that maps to the database that stores a description, name, and the real URL. Then it issues a redirect, which is an HTTP 302 response with the target URL in the header.

Screenshot 2022 06 07 at 11.22.25 AM

Building the application

In this blog tutorial, you’ll learn how to build a basic URL shortener using Python and Django.

First, you’ll create a basic application in Python without using Docker. You’ll see how the application lets you shorten a URL. Next, you’ll build a Docker image for that application. You’ll also learn how Docker Compose can help you rapidly deploy your application within containers. Let’s jump in.

Key Components

Here’s what you’ll need to use for this tutorial:

Getting Started

Once you have Python 3.8+ installed on your system, follow these steps to build a basic URL shortener clone from scratch.

Step 1. Create a Python virtual environment

Virtualenv is a tool for creating isolated virtual python environments. It’s a self-contained directory tree that contains a Python installation from a particular version of Python, as well as a number of additional packages.

The venv module is used to create and manage virtual environments. In most of the cases, venv is usually the most recent version of Python. If you have multiple versions of Python, you can create a specific Python version.

Use this command to create a Python virtual environment to install packages locally

mkdir -p venv
python3 -m venv venv

The above command will create a directory if it doesn’t exist and also create sub-directories that contain a copy of the Python interpreter and a number of supporting files.

$ tree venv -L 2
venv
├── bin
│   ├── Activate.ps1
│   ├── activate
│   ├── activate.csh
│   ├── activate.fish
│   ├── easy_install
│   ├── easy_install-3.8
│   ├── pip
│   ├── pip3
│   ├── pip3.8
│   ├── python -> python3
│   └── python3 -> /usr/bin/python3
├── include
├── lib
│   └── python3.8
├── lib64 -> lib
└── pyvenv.cfg

5 directories, 12 files

Once you’ve created a virtual environment, you’ll need to activate it:

source ./venv/bin/activate

Step 2. Install Django

The easiest way to install Django is to use the standalone pip installer. PIP(Preferred Installer Program) is the most popular package installer for Python and is a command-line utility that helps you to manage your Python 3rd-party packages. Use the following command to update the pip package and then install Django:

pip install -U pip
pip install Django

You’ll see the following results:

Collecting django
  Downloading Django-4.0.4-py3-none-any.whl (8.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.0/8.0 MB 15.9 MB/s eta 0:00:00
Collecting asgiref<4,>=3.4.1
  Downloading asgiref-3.5.2-py3-none-any.whl (22 kB)
Collecting sqlparse>=0.2.2
  Downloading sqlparse-0.4.2-py3-none-any.whl (42 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.3/42.3 kB 1.7 MB/s eta 0:00:00
Collecting backports.zoneinfo
  Downloading backports.zoneinfo-0.2.1.tar.gz (74 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.1/74.1 kB 3.0 MB/s eta 0:00:00
  Installing build dependencies ... done
   …..

Step 3. Create a Django project

The django-admin is Django’s command-line utility for administrative tasks. The utility helps you automatically create manage.py in each Django project.

mkdir -p src/ && cd src
django-admin startproject url shortener

Django Project Structure:

$ tree urlshortener/
urlshortener/
├── manage.py
└── urlshortener
    ├── __init__.py
    ├── asgi.py
    ├── settings.py
    ├── urls.py
    └── wsgi.py

1 directory, 6 files

In this directory tree:

  • manage.py is Django’s CLI
  • settings.py is where all of the global Django project’s settings reside
  • urls.py is where all the URL mappings reside
  • wsgi.py is an entry-point for WSGI-compatible servers to serve the project in production

Step 4. Creating a Django app for shortening the URL

Change directory to src/urlshortener and run the following command:

cd src/urlshortener
python manage.py startapp main

It will create a new subdirectory called “main” under src/urlshortener as shown below:

src
└── urlshortener
    ├── main
    │   ├── admin.py
    │   ├── apps.py
    │   ├── __init__.py
    │   ├── migrations
    │   ├── models.py
    │   ├── tests.py
    │   └── views.py
    ├── manage.py
    └── urlshortener

In this directory tree:

  • admin.py is where Django’s built-in admin configuration resides
  • migrations is where all of the database migrations reside
  • models.py is where all of the database models for this Django app exist
  • tests.py is self-explanatory
  • views.py is where “controller” functions reside, the functions that are in charge of creating the views

For this tutorial, you’ll only leverage the last one.

Step 5. Create the URL Shortener

pyshorteners is a simple URL shortening API wrapper Python library. With pyshorteners , you can generate a short url or expand another one is as easy as typing

Run the following command to install the package pyshorteners:

pip install pyshorteners

Run the following command to save all your python libraries with current version into requirements.txt file:

pip freeze > requirements.txt

Once the command is successfully run, the requirements.txt gets created with the following entries:

asgiref==3.5.2
backports.zoneinfo==0.2.1
certifi==2022.5.18.1
charset-normalizer==2.0.12
Django==4.0.5
idna==3.3
pyshorteners==1.0.1
requests==2.27.1
sqlparse==0.4.2
urllib3==1.26.9

Head to main/views.py and edit it accordingly:

from django.shortcuts import render
from django.http import HttpResponse
import pyshorteners


# Create your views here.
def shorten(request, url):
    shortener = pyshorteners.Shortener()
    shortened_url = shortener.chilpit.short(url)
    return HttpResponse(f'Shortened URL: <a href="{shortened_url}">{shortened_url}</a>')

In this code listing:

  • In line 1, the render function is imported by default. You won’t remove it now, as you’re going to use it later.
  • In line 2, you’ve imported the class name HttpResponse. This is the type returned with an HTML text.
  • In line 3, the library pyshorteners is imported, which you use to shorten the given URLs.
  • In line 7, the function gets two parameters; a request that is mandatory, and a url that is set by Django. We’ll get to it in the next file.
  • In line 8, you initialized the shortener object.
  • In line 9, the shortened URL is generated by sending a request to chilp.it.
  • In line 10, the shortened URL is returned as a minimal HTML link.

Next, let’s assign a URL to this function.

Create a urls.py under main:

touch main/urls.py

Add the below code:

from django.urls import path

from . import views

urlpatterns = [
    path('shorten/<str:url>', views.shorten, name='shorten'),
]

The URL mapping specifies which function to use and which path parameters there are. In this case, the URL is mapped to the function shorten and with a string parameter named url.

Now head back to the urlshortener/ directory and include the newly created urls.py file:

from django.contrib import admin
from django.urls import include, path

urlpatterns = [
    path('', include('main.urls')),
    path('admin/', admin.site.urls),
]

Now, run the development server:

python manage.py runserver

Open http://127.0.0.1:8000/shorten/google.com in your browser and type Enter. It will show you a shortened URL as shown in the following screenshot.

Screenshot 2022 06 06 at 10.27.32 AM

Step 6. Creating the form

In this section, you’ll see how to create a landing page.

mkdir -p main/templates/main
touch main/templates/main/index.html

Open the index.html and fill it up the with following content:

<form action="{% url 'main:shorten_post' %}" method="post">
{% csrf_token %}
<fieldset>
    <input type="text" name="url">
</fieldset>
<input type="submit" value="Shorten">
</form>

In this file:

  • The form action which the URL form sends the request to, is defined by Django’s template tag url. The tag in use is the one created in the URL mappings. Here, the URL tag main:shorten_post doesn’t exist yet. You’ll create it later.
  • The CSRF token is a Django security measure that works out-of-the-box.

Head over to main/views.py under the project directory src/urlshortener/ and add two functions, namely index and shorten_post at the end of the file.

from django.shortcuts import render
from django.http import HttpResponse
import pyshorteners


def index(request):
    return render(request, 'main/index.html')


def shorten_post(request):
    return shorten(request, request.POST['url'])


. . .

Here,

  • The function index renders the HTML template created in the previous step, using the render function.
  • The function shorten_post is a function created to be used for the post requests. The reason for its creation (and not using the previous function) is because Django’s URL mapping only works with path parameters and not post request parameters. So, here, the parameter url is read from the post request and passed to the previously available shorten function.

Now go to the main/urls.py to bind the functions to URLs:

from django.urls import path

from . import views

urlpatterns = [
    path('', views.index, name='index'),
    path('shorten', views.shorten_post, name='shorten_post'),
    path('shorten/<str:url>', views.shorten, name='shorten'),
]

Next, head over to urlshortener/settings.py under src/urlshortener/urlshortener directory and add 'main.apps.MainConfig' to the beginning of the list INSTALLED_APPS:

. . .

INSTALLED_APPS = [
    'main.apps.MainConfig',
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
]

. . .

Step 7. Creating a Database Models

Now, to save the URLs and their short versions locally, you should create database models for them. Head to main/models.py under src/urlshortener/main and create the following model:

from django.db import models


# Create your models here.
class Question(models.Model):
    original_url = models.CharField(max_length=256)
    hash = models.CharField(max_length=10)
    creation_date = models.DateTimeField('creation date')

We’ll assume that the given URLs fit in 256 characters and the short version are less than 10 characters (usually 7 characters would suffice).

Now, create the database migrations:

python manage.py makemigrations

It will show the following results:

Migrations for 'main':
  main/migrations/0001_initial.py
    - Create model Question

A new file will be created under main/migrations.

main % tree migrations 
migrations
├── 0001_initial.py
├── __init__.py
└── __pycache__
    └── __init__.cpython-39.pyc

1 directory, 3 files

Now to apply the database migrations to the default SQLite DB, run:

python manage.py migrate

It shows the following results:

urlshortener % python3 manage.py migrate    
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, main, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
  Applying main.0001_initial... OK
  Applying sessions.0001_initial... OK

Now that you have the database models, it’s time to create a shortener service. Create a Python file main/service.py and add the following functionality:

import random
import string
from django.utils import timezone

from .models import LinkMapping


def shorten(url):
    random_hash = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for _ in range(7))
    mapping = LinkMapping(original_url=url, hash=random_hash, creation_date=timezone.now())
    mapping.save()
    return random_hash


def load_url(url_hash):
    return LinkMapping.objects.get(hash=url_hash)

In this file, in the function shorten, you create a random 7-letter hash, assign the entered URL to this hash, save it into the database, and finally return the hash.

In load_url, you load the original URL from the given hash.

Now, create a new function in the views.py for redirecting:

from django.shortcuts import render, redirect

from . import service

. . .

def redirect_hash(request, url_hash):
    original_url = service.load_url(url_hash).original_url
    return redirect(original_url)

Then create a URL mapping for the redirect function:

urlpatterns = [
    path('', views.index, name='index'),
    path('shorten', views.shorten_post, name='shorten_post'),
    path('shorten/<str:url>', views.shorten, name='shorten'),
    path('<str:url_hash>', views.redirect_hash, name='redirect'),
]

You create a URL mapping for the hashes directly under the main host, e.g. example.com/xDk8vdX. If you want to give it an indirect mapping, like example.com/r/xDk8vdX, then the shortened URL will be longer.

The only thing you have to be careful about is the other mapping example.com/shorten. We made this about the redirect mapping, as otherwise it would’ve resolved to redirect as well.

The final step would be changing the shorten view function to use the internal service:

from django.shortcuts import render, redirect
from django.http import HttpResponse
from django.urls import reverse

from . import service

. . .

def shorten(request, url):
    shortened_url_hash = service.shorten(url)
    shortened_url = request.build_absolute_uri(reverse('redirect', args=[shortened_url_hash]))
    return HttpResponse(f'Shortened URL: <a href="{shortened_url}">{shortened_url}</a>')

You can also remove the third-party shortener library from requirements.txt, as you won’t use it anymore.

Using PostgreSQL

To use PostgreSQL instead of SQLite, you change the config in settings.py:

import os

. . .

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': BASE_DIR / 'db.sqlite3',
    }
}

if os.environ.get('POSTGRES_NAME'):
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.postgresql',
            'NAME': os.environ.get('POSTGRES_NAME'),
            'USER': os.environ.get('POSTGRES_USER'),
            'PASSWORD': os.environ.get('POSTGRES_PASSWORD'),
            'HOST': 'db',
            'PORT': 5432,
        }
    }

The if statement means it only uses the PostgreSQL configuration if it exists in the environment variables. If not set, Django will keep using the SQLite config.

Create a base.html under main/templates/main:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Link Shortener</title>
  <link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet">
  <script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
</head>
<style>
  #main-card {
      margin:0 auto;
      display: flex;
      width: 50em;
      align-items: center;
  }
</style>
<body class="mdc-typography">
<div id="main-card">
  {% block content %}
  {% endblock %}
</div>
</body>

Alter the index.html to use material design:

{% extends 'main/base.html' %}

{% block content %}
<form action="{% url 'shorten_post' %}" method="post">
  {% csrf_token %}
  <label class="mdc-text-field mdc-text-field--outlined">
      <span class="mdc-notched-outline">
        <span class="mdc-notched-outline__leading"></span>
        <span class="mdc-notched-outline__notch">
          <span class="mdc-floating-label" id="my-label-id">URL</span>
        </span>
        <span class="mdc-notched-outline__trailing"></span>
      </span>
    <input type="text" name="url" class="mdc-text-field__input" aria-labelledby="my-label-id">
  </label>
  <button class="mdc-button mdc-button--outlined" type="submit">
    <span class="mdc-button__ripple"></span>
    <span class="mdc-button__label">Shorten</span>
  </button>
</form>
{% endblock %}

Create another view for the response, namely link.html:

{% extends 'main/base.html' %}

{% block content %}
<div class="mdc-card__content">
  <p>Shortened URL: <a href="{{shortened_url}}">{{shortened_url}}</a></p>
</div>
{% endblock %}

Now, get back to views.py and change the shorten function to render instead of returning a plain HTML:

. . .

def shorten(request, url):
    shortened_url_hash = service.shorten(url)
    shortened_url = request.build_absolute_uri(reverse('redirect', args=[shortened_url_hash]))
    return render(request, 'main/link.html', {'shortened_url': shortened_url})

Click here to access the code previously developed for this example. You can directly clone the repository and try executing the following commands to bring up the application.

git clone https://github.com/aerabi/link-shortener
cd link-shortener/src/urlshortener
python manage.py migrate
python manage.py runserver

Screenshot 2022 06 06 at 4.06.28 PM

Step 8. Containerizing the Django App

Docker helps you containerize your Django app, letting you bundle together your complete Django application, runtime, configuration, and OS-level dependencies. This includes everything needed to ship a cross-platform, multi-architecture web application.

Let’s look at how you can easily run this app inside a Docker container using a Docker Official Image. First, you’ll need to download Docker Desktop. Docker Desktop accelerates the image-building process while making useful images more discoverable. Complete the installation process once your download is finished.

You’ve effectively learned how to build a sample Django app. Next, let’s see how to create an associated Docker image for this application.

Docker uses a Dockerfile to specify each image’s “layers.” Each layer stores important changes stemming from the base image’s standard configuration. Create the following empty Dockerfile in your Django project.

touch Dockerfile

Use your favorite text editor to open this Dockerfile. You’ll then need to define your base image.

Whenever you’re creating a Docker image to run a Python program, it’s always recommended to use a smaller base image that helps in speeding up the build process and launching containers at a faster pace.

FROM python:3.9

Next, let’s quickly create a directory to house our image’s application code. This acts as the working directory for your application

RUN mkdir /code
WORKDIR /code

It’s always recommended to update all the packages using the pip command.

RUN pip install --upgrade pip

The following COPY instruction copies the requirements.txt file from the host machine to the container image and stores it under /code directory.

COPY requirements.txt /code/
RUN pip install -r requirements.txt

Next, you need to copy all the directories of the Django project. It includes Django source code and pre-environment configuration files of the artifact.

COPY . /code/

Next, use the EXPOSE instruction to inform Docker that the container listens on the specified network ports at runtime. The EXPOSE instruction doesn’t actually publish the port. It functions as a type of documentation between the person who builds the image and the person who runs the container, about which ports are intended to be published.

EXPOSE 8000

Finally, in the last line of the Dockerfile, specify CMD so as to provide defaults for an executing container. These defaults include Python executables. The runserver command is a built-in subcommand of Django’s manage.py file that will start up a development server for this specific Django project.

CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]

Here’s your complete Dockerfile:

FROM python:3.9


RUN mkdir /code
WORKDIR /code
RUN pip install --upgrade pip
COPY requirements.txt /code/

RUN pip install -r requirements.txt
COPY . /code/

EXPOSE 8000

CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]

Step 9. Building Your Docker Image

Next, you’ll need to build your Docker image. Enter the following command to kickstart this process, which produces an output soon after:

docker build -t urlshortener .

Step 10. Run Your Django Docker Container

Docker runs processes in isolated containers. A container is a process that runs on a host, which it’s either local or remote. When an operator executes docker run, the container process that runs is isolated with its own file system, networking, and separate process tree from the host.

The following docker run command first creates a writeable container layer over the specified image, and then starts it.

docker run -p 8000:8000 -t urlshortener

Step 11. Running URL Shortener app using Docker Compose

Finally, it’s time to create a Docker Compose file. This single YAML file lets you specify your frontend app and your PostgreSQL database:

services:
  web:
    build:
      context: ./src/urlshortener/
      dockerfile: Dockerfile
    command: gunicorn urlshortener.wsgi:application --bind 0.0.0.0:8000
    ports:
      - 8000:8000
    environment:
      - POSTGRES_NAME=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    depends_on:
      - db
  db:
    image: postgres
    volumes:
      - postgresdb:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres

volumes:
  postgresdb:


Your example application has the following parts:

  • Two services backed by Docker images: your frontend web app and your backend database
  • The frontend, accessible via port 8000
  • The depends_on parameter, letting you create the backend service before the frontend service starts
  • One persistent volume, attached to the backend
  • The environmental variables for your PostgreSQL database

You’ll then start your services using the docker-compose up command.

docker-compose up -d -—build

Note: If you’re using Docker Compose v1, the command line name is docker-compose, with a hyphen. If you’re using v2, which is shipped with Docker Desktop, you should omit the hyphen: docker compose.

docker-compose ps
NAME                   COMMAND                  SERVICE             STATUS              PORTS
link-shortener-db-1    "docker-entrypoint.s…"   db                  running             5432/tcp
link-shortener-web-1   "gunicorn urlshorten…"   web                 running             0.0.0.0:8000->8000/tcp

Now, it’s time to perform the migration:

docker-compose exec web python manage.py migrate

Just like that, you’ve created and deployed your Django URL-shortener app! This is usable in your browser, like before:

Screenshot 2022 06 06 at 6.03.46 PM

You can get the shortened URL by adding the URL as shown below:

Screenshot 2022 06 06 at 6.04.54 PM

Conclusion

Docker helps accelerate the process of building, running, and sharing modern applications. Docker Official Images help you develop your own unique applications, no matter what tech stack you’re accustomed to. With a single YAML file, we demonstrated how Docker Compose helps you easily build and deploy a Django-based URL shortener app in a matter of seconds. With just a few extra steps, you can apply this tutorial while building applications with much greater complexity.

Happy coding.

]]>
How to “Dockerize” Your Python Applications https://www.docker.com/blog/how-to-dockerize-your-python-applications/ Fri, 22 Apr 2022 17:34:50 +0000 https://www.docker.com/?p=33168 Millions of developers use Python to build modern, scalable applications. For developers who value performance, cross-platform portability, and convenience, deploying these apps within a Docker environment can be advantageous. That said, what kinds of applications can you create? How does Docker support their functionality? 

Patrick Loeber answers these questions and more during his presentation, “Getting Started with Docker and Python

If you’re seeking a blueprint for deploying Python apps within Docker containers, look no further! Patrick uses many mechanisms, libraries, and commands that you might leverage yourself while developing applications. He also covers some Docker basics—making it much easier to incorporate Docker without expert knowledge. Let’s jump in. 

Getting Started

To develop with Python and Docker, first ensure that Python v3.7.13+ is installed on your machine. Downloadable packages are available at Python.org for all mainstream OSes: 

 

You’ll also need three additional tools before starting: 

  1. The latest version of Docker Desktop, for either Windows or macOS (Intel or M-series processor). These OS links are direct download links—clicking them will jumpstart the process automatically
  2. Your preferred code editor, though we recommend Visual Studio Code (VS Code)
  3. The Docker extension for VS Code


Note: The
Docker Desktop for Linux (DD4L) Beta is also available. The Beta program is aimed at early adopters who’d like to try Docker Desktop for Linux, and provide feedback.


Docker Desktop includes the CLI, which you’ll need for this exercise and tasks tied to your Docker containers. It’s even easy to launch terminal windows within your containers. 

Meanwhile, VS Code’s Docker extension provides autocompletion, debugging support, and syntax hints. It also lets you manage containers, images, and registries via the Docker Explorer sidebar UI. Finally, the extension streamlines the editing process for each Dockerfile and docker-compose.yml files. 

Setup Steps and Dependencies

While building Python applications, your working directories (workdir) play key roles. Pointing your application towards critical configuration files and other resources during runtime is critical. This allows your Python app to run effectively and predictably. Custom directories let your applications run in a well-defined state, which is perfect for testing and deployment. 

Here’s how this looks in practice: 

  1. Create a new, named project within your editor.
  2. Form your new directory by creating a new root project folder in the sidebar, and naming it.
  3. Open a new workspace named main.py.
  4. Enter the cd [root folder name]command in the Terminal to tap into your new directory.
  5. Copy and paste any pre-existing Python application code into your main.py workspace. Otherwise, manually enter your application code. 

 

Docker Tutorial Image

View your project tree in VS Code using the sidebar, while your file path is displayed in the Terminal.

 

You’ve now effectively laid the groundwork for your application. However, more steps are necessary before spinning it up. 

Docker Components

Your Python app also requires some basic Docker components to work properly. You’ll need your Dockerfile, image, and container. VS Code will automatically detect that a Dockerfile belongs to Docker and label it accordingly. It’s now time to piece that together. 

First, you should pull down a tagged Python base image using the following command: 

FROM python:3.9
# Or any preferred Python version

Docker Hub contains a number of Python Official Images for use with your project. These resources include numerous tags. What if you have size constraints? Choosing a slim image build is smart, since they’re miniscule. Similarly, python:<version>-alpine achieves similar results. This image may cater to sensitive workloads due to its minimal attack surface. 

Docker Hub Screenshot

You’ll need to add your source file into your container’s base folder, for any Python project. That respective command below draws upon our earlier example: 

ADD main.py .

However, you’ll likely need external libraries to add critical functionality into your application. Let’s jump into that now. 

Useful Third-Party Libraries

Patrick’s example leveraged two popular libraries—requests and BeautifulSoup—to create his app’s crawling capabilities on IMDB’s database. However, these two libraries have widespread appeal. 

Requests

HTTP for Humans

Screenshot courtesy of python-requests.org.

Data transit is vital to modern applications. The Requests library evolved as a general-purpose solution for sending HTTP requests—hence its name. Docker supports Python containers that use the import requests command. There are also multiple images on Docker Hub that can accommodate such use cases. 

 In fact, one of our own developer advocates recently created an application that periodically pulls COVID vaccination rates from Our World in Data’s public repository. If you’re interested in how IoT can help you track evolving data, check out Shy Ruparel’s walkthrough. 

BeautifulSoup

Beautiful soup

Screenshot courtesy of Beautiful Soup.

Meanwhile, the import BeautifulSoup command lets you scrape web pages for crucial data. While you can do this with IMDB, you can grab any data embedded within publicly-accessible HTML and XML resources. Accordingly, you could easily create a basic parser application within a Docker container. We recommend using lightweight, Alpine Linux containers for this and other similar projects. 

Python-Dotenv

Finally, you’ll need to use environment variables for configuration while containerized apps are running. These key/value pairs are tied to security, yet they also impact an app’s functionality. Since they live outside of the app itself, you must explicitly reference your .env files to retrieve any variables. 

This is where the python-dotenv library shines. By entering pip install python-dotenv, you can quickly get this library up and running. With the from dotenv import load_dotenv and from dotenv import dotenv_values commands, you can load configurations from files with or without touching your environment. These commands—and numerous others within the library—help update your active applications. 

 You can also use Docker Compose to handle environment variables—which comes bundled within Docker Desktop. Entering the docker compose upcommand will automatically prompt your containers to grab any .env values and substitute them. This allows for active configuration changes.

Library Installation and Understanding Your Dockerfile

You must install any third-party libraries for them to work properly. Enter the following command to do so, using our earlier libraries: 

RUN pip install requests beautifulsoup4 python-dotenv

Lastly, you’ll enter the command that Docker will execute once your container has started: 

CMD [“python”, “./main.py”] 
# Or enter the name of your unique directory and parameter set.

Together, these commands and earlier arguments make up your Dockerfile. This file exists below:

FROM python:3.9 
# Or any preferred Python version.
ADD main.py .
RUN pip install requests beautifulsoup4 python-dotenv
CMD [“python”, “./main.py”] 
# Or enter the name of your unique directory and parameter set.

This Dockerfileis fairly basic, which is perfect for this application. Your Dockerfile will change depending on your code and desired app functionality. There are also other arguments available, like WORKDIR, ENV, COPY, EXPOSE, ENTRYPOINT and HEALTHCHECK. Each allows you to build more operative complexity into your Python applications, or control which resources are pulled in. 

Wrapping Up

Finally, confirm that everything’s up and running after it’s in place. Quickly verify that you have Docker installed on your machine, and note the version number to make sure it’s current. 

You also have your image, but now you have to build it within VS Code. Use the following command:

docker build -t python-imagename . 

 The build process can take anywhere from a few seconds to a few minutes. Once your image is available and usable, simply enter docker run python-imagename, which should successfully prompt your application to run! You can confirm this based on your terminal’s subsequent readout: 

python 3.9

Last but not least, you’re now ready to run your Python application. In the Terminal, simply enter docker run python-imagename. Your output will vary depending on your app’s functionality, but here’s how it looks for Patrick’s IMDB use case: 

exporting image command

If you’d like, you can also experiment with loops and conditional user inputs. Remember that any file changes you make after building your initial image will require a rebuild prior to implementation. 

Congratulations! You’ve learned the process for Dockerizing a basic Python application. Should you need to lighty manage your Docker containers, you can hop into Docker Desktop and easily accomplish these tasks with just a few clicks. 

Other Powerful Use Cases

Patrick tackled two examples in his demos. First, he showed us how to “Dockerize” a Python script that accesses IMDB’s movie database. Second, he showed us how to build and containerize a Python web application. 

Accordingly, polling IMDB’s database is just one of many awesome things possible with Python and Docker. Here are a few other apps you can build using both technologies: 


This is just the tip of the iceberg. To learn more about building your own Python image,
check out our documentation. The Docker SDK for Python is another useful way to run docker commands within Python apps themselves. 

Are you a film buff who’s also eager to explore further? Check out Lorenzo Costa’s tutorial on quickly deploying your own Game of Thrones API with Flask, MongoDB, and other tools.

Join us for DockerCon 2022!

Want to learn more about Docker, Dockerfiles, and Python? Register now and join us at DockerCon 2022, from May 9th-10th. You’ll learn how to build Docker development environments, deploy machine-learning models, and how to use Python and Docker for data science.

]]>
Video: How to Dockerize a Python App with FastAPI https://www.docker.com/blog/video-how-to-dockerize-a-python-app-with-fastapi/ Wed, 05 May 2021 21:39:25 +0000 https://www.docker.com/blog/?p=28260 Join host Peter McKee and Python wizard Michael Kennedy for a warts-and-all demo of how to Dockerize a Python app using FastAPI, a popular Python framework. Kennedy is a developer and entrepreneur, and the founder and host of two successful Python podcasts — Talk Python To Me and Python Bytes. He’s also a Python Software Foundation Fellow.

With some skillful back-seat driving by McKee, Kennedy shows how to build a bare-bones web API — in this case one that allows you to ask questions and get answers about movies (director, release date, etc.) — by mashing together a movie service and FastAPI. Next, he shows how to put it into a Docker container, create an app and run it, finally sharing the image on GitHub.

If you’re looking for a scripted, flawless, pre-recorded demo, this is not the one for you! McKee and Kennedy iterate and troubleshoot their way through the process — which makes this a great place to start if you’re new to Dockerizing Python apps. Install scripts, libraries, automation, security, best practices, and a pinch of Python zen — it’s all here. (Duration 1 hour, 10 mins.)

Join Us for DockerCon LIVE 2021  

Join us for DockerCon LIVE 2021 on Thursday, May 27. DockerCon LIVE is a free, one day virtual event that is a unique experience for developers and development teams who are building the next generation of modern applications. If you want to learn about how to go from code to cloud fast and how to solve your development challenges, DockerCon LIVE 2021 offers engaging live content to help you build, share and run your applications. Register today at https://dockr.ly/2PSJ7vn

]]>
Guest Post: Calling the Docker CLI from Python with Python-on-whales https://www.docker.com/blog/guest-post-calling-the-docker-cli-from-python-with-python-on-whales/ Thu, 11 Mar 2021 20:00:00 +0000 https://www.docker.com/blog/?p=27679 pythonwhale
Image: Alice Lang, alicelang-creations@outlook.fr

At Docker, we are incredibly proud of our vibrant, diverse and creative community. From time to time, we feature cool contributions from the community on our blog to highlight some of the great work our community does. The following is a guest post by Docker community member Gabriel de Marmiesse. Are you working on something awesome with Docker? Send your contributions to William Quiviger (@william) on the Docker Community Slack and we might feature your work!   

The most common way to call and control Docker is by using the command line.

With the increased usage of Docker, users want to call Docker from programming languages other than shell. One popular way to use Docker from Python has been to use docker-py. This library has had so much success that even docker-compose is written in Python, and leverages docker-py.

The goal of docker-py though is not to replicate the Docker client (written in Golang), but to talk to the Docker Engine HTTP API. The Docker client is extremely complex and is hard to duplicate in another language. Because of this, a lot of features that were in the Docker client could not be available in docker-py. Sometimes users would sometimes get frustrated because docker-py did not behave exactly like the CLI.

Today, we’re presenting a new project built by Gabriel de Marmiesse from the Docker community: Python-on-whales. The goal of this project is to have a 1-to-1 mapping between the Docker CLI and the Python library. We do this by communicating with the Docker CLI instead of calling directly the Docker Engine HTTP API.

Docker clients 1

If you need to call the Docker command line, use Python-on-whales. And if you need to call the Docker engine directly, use docker-py.

In this post, we’ll take a look at some of the features that are not available in docker-py but are available in Python-on-whales:

  • Building with Docker buildx
  • Deploying to Swarm with docker stack
  • Deploying to the local Engine with Compose

Start by downloading Python-on-whales with 

pip install python-on-whales

and you’re ready to rock!

Docker Buildx0

Here we build a Docker image. Python-on-whales uses buildx by default and gives you the output in real time.

>>> from python_on_whales import docker
>>> my_image = docker.build(".", tags="some_name")
[+] Building 1.6s (17/17) FINISHED
 => [internal] load build definition from Dockerfile                       0.0s
 => => transferring dockerfile: 32B                                        0.0s
 => [internal] load .dockerignore                                          0.0s
 => => transferring context: 2B                                            0.0s
 => [internal] load metadata for docker.io/library/python:3.6              1.4s
 => [python_dependencies 1/5] FROM docker.io/library/python:3.6@sha256:293 0.0s
 => [internal] load build context                                          0.1s
 => => transferring context: 72.86kB                                       0.0s
 => CACHED [python_dependencies 2/5] RUN pip install typeguard pydantic re 0.0s
 => CACHED [python_dependencies 3/5] COPY tests/test-requirements.txt /tmp 0.0s
 => CACHED [python_dependencies 4/5] COPY requirements.txt /tmp/           0.0s
 => CACHED [python_dependencies 5/5] RUN pip install -r /tmp/test-requirem 0.0s
 => CACHED [tests_ubuntu_install_without_buildx 1/7] RUN apt-get update && 0.0s
 => CACHED [tests_ubuntu_install_without_buildx 2/7] RUN curl -fsSL https: 0.0s
 => CACHED [tests_ubuntu_install_without_buildx 3/7] RUN add-apt-repositor 0.0s
 => CACHED [tests_ubuntu_install_without_buildx 4/7] RUN  apt-get update & 0.0s
 => CACHED [tests_ubuntu_install_without_buildx 5/7] WORKDIR /python-on-wh 0.0s
 => CACHED [tests_ubuntu_install_without_buildx 6/7] COPY . .              0.0s
 => CACHED [tests_ubuntu_install_without_buildx 7/7] RUN pip install -e .  0.0s
 => exporting to image                                                     0.1s
 => => exporting layers                                                    0.0s
 => => writing image sha256:e1c2382d515b097ebdac4ed189012ca3b34ab6be65ba0c 0.0s
 => => naming to docker.io/library/some_image_name

Docker Stacks

Here we deploy a simple Swarmpit stack on a local Swarm. You get a Stack object that has several methods: remove(), services(), ps().

>>> from python_on_whales import docker
>>> docker.swarm.init()
>>> swarmpit_stack = docker.stack.deploy("swarmpit", compose_files=["./docker-compose.yml"])
Creating network swarmpit_net
Creating service swarmpit_influxdb
Creating service swarmpit_agent
Creating service swarmpit_app
Creating service swarmpit_db
>>> swarmpit_stack.services()
[<python_on_whales.components.service.Service object at 0x7f9be5058d60>,
<python_on_whales.components.service.Service object at 0x7f9be506d0d0>,
<python_on_whales.components.service.Service object at 0x7f9be506d400>,
<python_on_whales.components.service.Service object at 0x7f9be506d730>]
>>> swarmpit_stack.remove()

Docker Compose

Here we show how we can run a Docker Compose application with Python-on-whales. Note that, behind the scenes, it uses the new version of Compose written in Golang. This version of Compose is still experimental. Take appropriate precautions.

$ git clone https://github.com/dockersamples/example-voting-app.git
$ cd example-voting-app
$ python
>>> from python_on_whales import docker
>>> docker.compose.up(detach=True)
Network "example-voting-app_back-tier"  Creating
Network "example-voting-app_back-tier"  Created
Network "example-voting-app_front-tier"  Creating
Network "example-voting-app_front-tier"  Created
example-voting-app_redis_1  Creating
example-voting-app_db_1  Creating
example-voting-app_db_1  Created
example-voting-app_result_1  Creating
example-voting-app_redis_1  Created
example-voting-app_worker_1  Creating
example-voting-app_vote_1  Creating
example-voting-app_worker_1  Created
example-voting-app_result_1  Created
example-voting-app_vote_1  Created
>>> for container in docker.compose.ps():
...     print(container.name, container.state.status)
example-voting-app_vote_1 running
example-voting-app_worker_1 running
example-voting-app_result_1 running
example-voting-app_redis_1 running
example-voting-app_db_1 running
>>> docker.compose.down()
>>> print(docker.compose.ps())
[]

Bonus section: Docker objects attributes as Python attributes

All information that you can access with docker inspect is available as Python attributes:

>>> from python_on_whales import docker
>>> my_container = docker.run("ubuntu", ["sleep", "infinity"], detach=True)
>>> my_container.state.started_at
datetime.datetime(2021, 2, 18, 13, 55, 44, 358235, tzinfo=datetime.timezone.utc)
>>> my_container.state.running
True
>>> my_container.kill()
>>> my_container.remove()

>>> my_image = docker.image.inspect("ubuntu")
>>> print(my_image.config.cmd)
['/bin/bash']

What’s next for Python-on-whales ?

We’re currently improving the integration of Python-on-whales with the new Compose in the Docker CLI (currently beta).

You can consider that Python-on-whales is in beta. Some small API changes are still possible. 

We encourage the community to try it out and give feedback in the issues!

To learn more about Python-on-whales:

]]>
How to Deploy GPU-Accelerated Applications on Amazon ECS with Docker Compose https://www.docker.com/blog/deploy-gpu-accelerated-applications-on-amazon-ecs-with-docker-compose/ Tue, 16 Feb 2021 17:00:00 +0000 https://www.docker.com/blog/?p=27442 shutterstock 1315361570

Many applications can take advantage of GPU acceleration, in particular resource-intensive Machine Learning (ML) applications. The development time of such applications may vary based on the hardware of the machine we use for development. Containerization will facilitate development due to reproducibility and will make the setup easily transferable to other machines. Most importantly, a containerized application is easily deployable to platforms such as Amazon ECS, where it can take advantage of different hardware configurations.

In this tutorial, we discuss how to develop GPU-accelerated applications in containers locally and how to use Docker Compose to easily deploy them to the cloud (the Amazon ECS platform). We make the transition from the local environment to a cloud effortless, the GPU-accelerated application being packaged with all its dependencies in a Docker image, and deployed in the same way regardless of the target environment.

Requirements

In order to follow this tutorial, we need the following tools installed locally:

For deploying to a cloud platform, we rely on the new Docker Compose implementation embedded into the Docker CLI binary. Therefore, when targeting a cloud platform we are going to run docker compose commands instead of docker-compose. For local commands, both implementations of Docker Compose should work. If you find a missing feature that you use, report it on the issue tracker.

Sample application

Keep in mind that what we want to showcase is how to structure and manage a GPU accelerated application with Docker Compose, and how we can deploy it to the cloud. We do not focus on GPU programming or the AI/ML algorithms, but rather on how to structure and containerize such an application to facilitate portability, sharing and deployment.

For this tutorial, we rely on sample code provided in the Tensorflow documentation, to simulate a GPU-accelerated translation service that we can orchestrate with Docker Compose. The original code can be found documented at  https://www.tensorflow.org/tutorials/text/nmt_with_attention. For this exercise, we have reorganized the code such that we can easily manage it with Docker Compose.

This sample uses the Tensorflow platform which can automatically use GPU devices if available on the host. Next, we will discuss how to organize this sample in services to containerize them easily and what the challenges are when we locally run such a resource-intensive application.

Note: The sample code to use throughout this tutorial can be found here. It needs to be downloaded locally to exercise the commands we are going to discuss.

1. Local environment

Let’s assume we want to build and deploy a service that can translate simple sentences to a language of our choice. For such a service, we need to train an ML model to translate from one language to another and then use this model to translate new inputs. 

Application setup

We choose to separate the phases of the ML process in two different Compose services:

  • A training service that trains a model to translate between two languages (includes the data gathering, preprocessing and all the necessary steps before the actual training process).
  • A translation service that loads a model and uses it to `translate` a sentence.

This structure is defined in the docker-compose.dev.yaml from the downloaded sample application which has the following content:

docker-compose.yml

services:

 training:
   build: backend
   command: python model.py
   volumes:
     - models:/checkpoints

 translator:
   build: backend
   volumes:
     - models:/checkpoints
   ports:
     - 5000:5000

volumes:
 models:

We want the training service to train a model to translate from English to French and to save this model to a named volume models that is shared between the two services. The translator service has a published port to allow us to query it easily.

Deploy locally with Docker Compose

The reason for starting with the simplified compose file is that it can be deployed locally whether a GPU is present or not. We will see later how to add the GPU resource reservation to it.

Before deploying, rename the docker-compose.dev.yaml to docker-compose.yaml to avoid setting the file path with the flag -f for every compose command.

To deploy the Compose file, all we need to do is open a terminal, go to its base directory and run:

$ docker compose up
The new 'docker compose' command is currently experimental.
To provide feedback or request new features please open
issues at https://github.com/docker/compose-cli
[+] Running 4/0
 ⠿ Network "gpu_default"  Created                               0.0s
 ⠿ Volume "gpu_models"    Created                               0.0s
 ⠿ gpu_translator_1       Created                               0.0s
 ⠿ gpu_training_1         Created                               0.0s
Attaching to gpu_training_1, gpu_translator_1
...
translator_1  |  * Running on http://0.0.0.0:5000/ (Press CTRL+C
to quit)
...
HTTP/1.1" 200 -
training_1    | Epoch 1 Batch 0 Loss 3.3540
training_1    | Epoch 1 Batch 100 Loss 1.6044
training_1    | Epoch 1 Batch 200 Loss 1.3441
training_1    | Epoch 1 Batch 300 Loss 1.1679
training_1    | Epoch 1 Loss 1.4679
training_1    | Time taken for 1 epoch 218.06381964683533 sec
training_1    | 
training_1    | Epoch 2 Batch 0 Loss 0.9957
training_1    | Epoch 2 Batch 100 Loss 1.0288
training_1    | Epoch 2 Batch 200 Loss 0.8737
training_1    | Epoch 2 Batch 300 Loss 0.8971
training_1    | Epoch 2 Loss 0.9668
training_1    | Time taken for 1 epoch 211.0763041973114 sec
...
training_1    | Checkpoints saved in /checkpoints/eng-fra
training_1    | Requested translator service to reload its model,
response status: 200
translator_1  | 172.22.0.2 - - [18/Dec/2020 10:23:46] 
"GET /reload?lang=eng-fra 

Docker Compose deploys a container for each service and attaches us to their logs which allows us to follow the progress of the training service.

Every 10 cycles (epochs), the training service requests the translator to reload its model from the last checkpoint. If the translator is queried before the first training phase (10 cycles) is completed, we should get the following message. 

$ curl -d "text=hello" localhost:5000/
No trained model found / training may be in progress...

From the logs, we can see that each training cycle is resource-intensive and may take very long (depending on parameter setup in the ML algorithm).

The training service runs continuously and checkpoints the model periodically to a named volume shared between the two services. 

$ docker ps -a
CONTAINER ID   IMAGE            COMMAND                  CREATED          STATUS                     PORTS                    NAMES
f11fc947a90a   gpu_training     "python model.py"        14 minutes ago   Up 54 minutes                   gpu_training_1                           
baf147fbdf18   gpu_translator   "/bin/bash -c 'pytho..." 14 minutes ago   Up 54 minutes              0.0.0.0:5000->5000/tcp   gpu_translator_1

We can now query the translator service which uses the trained model:

$ $ curl -d "text=hello" localhost:5000/
salut !
$ curl -d "text=I want a vacation" localhost:5000/
je veux une autre . 
$ curl -d "text=I am a student" localhost:5000/
je suis etudiant .

Keep in mind that, for this exercise, we are not concerned about the accuracy of the translation but how to set up the entire process following a service approach that will make it easy to deploy with Docker Compose.

During development, we may have to re-run the training process and evaluate it each time we tweak the algorithm. This is a very time consuming task if we do not use development machines built for high performance.

An alternative is to use on-demand cloud resources. For example, we could use cloud instances hosting GPU devices to run the resource-intensive components of our application. Running our sample application on a machine with access to a GPU will automatically switch to train the model on the GPU. This will speed up the process and significantly reduce the development time.

The first step to deploy this application to some faster cloud instances is to pack it as a Docker image and push it to Docker Hub, from where we can access it from cloud instances.

Build and Push images to Docker Hub

During the deployment with compose up, the application is packed as a Docker image which is then used to create the containers. We need to tag the built images and push them to Docker Hub.

 A simple way to do this is by setting the image property for services in the Compose file. Previously, we had only set the build property for our services, however we had no image defined. Docker Compose requires at least one of these properties to be defined in order to deploy the application.

We set the image property following the pattern <account>/<name>:<tag> where the tag is optional (default to ‘latest’). We take for example a Docker Hub account ID myhubuser and the application name gpudemo. Edit the compose file and set the image property for the two services as below:

docker-compose.yml

services:

 training:
   image: myhubuser/gpudemo
   build: backend
   command: python model.py
   volumes:
     - models:/checkpoints

 translator:
   image: myhubuser/gpudemo
   build: backend
   volumes:
     - models:/checkpoints
   ports:
     - 5000:5000

volumes:
 models:

To build the images run:

$ docker compose build
The new 'docker compose' command is currently experimental. To
provide feedback or request new features please open issues
 at https://github.com/docker/compose-cli
[+] Building 1.0s (10/10) FINISHED 
 => [internal] load build definition from Dockerfile
0.0s 
=> => transferring dockerfile: 206B
...
 => exporting to image
0.8s 
 => => exporting layers    
0.8s  
 => => writing image sha256:b53b564ee0f1986f6a9108b2df0d810f28bfb209
4743d8564f2667066acf3d1f
0.0s
 => => naming to docker.io/myhubuser/gpudemo

$ docker images | grep gpudemo
myhubuser/gpudemo  latest   b53b564ee0f1   2 minutes ago 
  5.83GB   

Notice the image has been named according to what we set in the Compose file.

Before pushing this image to Docker Hub, we need to make sure we are logged in. For this we run:

$ docker login
...
Login Succeeded

Push the image we built:

$ docker compose push
Pushing training (myhubuser/gpudemo:latest)...
The push refers to repository [docker.io/myhubuser/gpudemo]
c765bf51c513: Pushed
9ccf81c8f6e0: Layer already exists
...
latest: digest: sha256:c40a3ca7388d5f322a23408e06bddf14b7242f9baf7fb
e7201944780a028df76 size: 4306

The image pushed is public unless we set it to private in Docker Hub’s repository settings. The Docker documentation covers this in more detail.

With the image stored in a public image registry, we will look now at how we can use it to deploy our application on Amazon ECS and how we can use GPUs to accelerate it.

2. Deploy to Amazon ECS for GPU-acceleration

To deploy the application to Amazon ECS, we need to have credentials for accessing an AWS account and to have Docker CLI set to target the platform.

Let’s assume we have a valid set of AWS credentials that we can use to connect to AWS services.  We need now to create an ECS Docker context to redirect all Docker CLI commands to Amazon ECS.

Create an ECS context

To create an ECS context run the following command:

$ docker context create ecs cloud
? Create a Docker context using:  [Use arrows to move, type
to filter]
> AWS environment variables 
  An existing AWS profile
  A new AWS profile

This prompts users with 3 options, depending on their familiarity with the AWS credentials setup.

For this exercise, to skip the details of  AWS credential setup, we choose the first option. This requires us to have the AWS_ACCESS_KEY and AWS_SECRET_KEY set in our environment,  when running Docker commands that target Amazon ECS.

We can now run Docker commands and set the context flag for all commands targeting the platform, or we can switch it to be the context in use to avoid setting the flag on each command.

Set Docker CLI to target ECS

Set the context we created previously as the context in use by running:

$ docker context use cloud

$ docker context ls
NAME                TYPE                DESCRIPTION                               DOCKER ENDPOINT               KUBERNETES ENDPOINT   ORCHESTRATOR
default             moby                Current DOCKER_HOST based configuration   unix:///var/run/docker.sock                         swarm
cloud *             ecs                 credentials read from environment

Starting from here, all the subsequent Docker commands are going to target Amazon ECS. To switch back to the default context targeting the local environment, we can run the following:

$ docker context use default

For the following commands, we keep ECS context as the current context in use. We can now run a command to check we can successfully access ECS.

$ AWS_ACCESS_KEY="*****" AWS_SECRET_KEY="******" docker compose ls
NAME                                STATUS 

Before deploying the application to Amazon ECS, let’s have a look at how to update the Compose file to request GPU access for the training service. This blog post describes a way to define GPU reservations. In the next section, we cover the new format supported in the local compose and the legacy docker-compose.

Define GPU reservation in the Compose file

Tensorflow can make use of NVIDIA GPUs with CUDA compute capabilities to speed up computations. To reserve NVIDIA GPUs,  we edit the docker-compose.yaml  that we defined previously and add the deploy property under the training service as follows:

...
 training:
   image: myhubuser/gpudemo
   command: python model.py eng-fra
   volumes:
     - models:/checkpoints
   deploy:
     resources:
       reservations:
         memory:32Gb
         devices:
         - driver: nvidia
           count: 2
           capabilities: [gpu]
...

For this example we defined a reservation of 2 NVIDIA GPUs and 32GB memory dedicated to the container. We can tweak these parameters according to the resources of the machine we target for deployment. If our local dev machine hosts an NVIDIA GPU, we can tweak the reservation accordingly and deploy the Compose file locally.  Ensure you have installed the NVIDIA container runtime and set up the Docker Engine to use it before deploying the Compose file.

We focus in the next part on how to make use of GPU cloud instances to run our sample application.

Note: We assume the image we pushed to Docker Hub is public. If so, there is no need to authenticate in order to pull it (unless we exceed the pull rate limit). For images that need to be kept private, we need to define the x-aws-pull_credentials property with a reference to the credentials to use for authentication. Details on how to set it can be found in the documentation.

Deploy to Amazon ECS

Export the AWS credentials to avoid setting them for every command.

$ export AWS_ACCESS_KEY="*****" 
$ export AWS_SECRET_KEY="******"

When deploying the Compose file, Docker Compose will also reserve an EC2 instance with GPU capabilities that satisfies the reservation parameters. In the example we provided, we ask to reserve an instance with 32GB and 2 Nvidia GPUs. Docker Compose matches this reservation with the instance that satisfies this requirement. Before setting the reservation property in the Compose file, we recommend to check the Amazon GPU instance types and set your reservation accordingly. Ensure you are targeting an Amazon region that contains such instances.

WARNING: Aside from ECS containers, we will have a `g4dn.12xlarge` EC2 instance reserved. Before deploying to the cloud, check the Amazon documentation for the resource cost this will incur.

To deploy the application, we run the same command as in the local environment.

$ docker compose up     
[+] Running 29/29
 ⠿ gpu                 CreateComplete          423.0s  
 ⠿ LoadBalancer        CreateComplete          152.0s
 ⠿ ModelsAccessPoint   CreateComplete            6.0s
 ⠿ DefaultNetwork      CreateComplete            5.0s
...
 ⠿ TranslatorService   CreateComplete          205.0s
 ⠿ TrainingService     CreateComplete          161.0s

Check the status of the services:

$ docker compose ps
NAME                                        SERVICE             STATE               PORTS
task/gpu/3311e295b9954859b4c4576511776593   training            Running             
task/gpu/78e1d482a70e47549237ada1c20cc04d   translator          Running             gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000->5000/tcp

Query the exposed translator endpoint. We notice the same behaviour as in the local deployment (the model reload has not been triggered yet by the training service).

$ curl -d "text=hello" gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000/
No trained model found / training may be in progress...

Check the logs for the GPU device’s tensorflow detected. We can easily identify the 2 GPU devices we reserved and how the training is almost 10X faster than our CPU-based local training.

$ docker compose logs
...
training    | 2021-01-08 20:50:51.595796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
training    | pciBusID: 0000:00:1c.0 name: Tesla T4 computeCapability: 7.5
training    | coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
...
training    | 2021-01-08 20:50:51.596743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties: 
training    | pciBusID: 0000:00:1d.0 name: Tesla T4 computeCapability: 7.5
training    | coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
...

training      | Epoch 1 Batch 300 Loss 1.2269
training      | Epoch 1 Loss 1.4794
training      | Time taken for 1 epoch 42.98397183418274 sec
...
training      | Epoch 2 Loss 0.9750
training      | Time taken for 1 epoch 35.13995909690857 sec
...
training      | Epoch 9 Batch 0 Loss 0.1375
...
training      | Epoch 9 Loss 0.1558
training      | Time taken for 1 epoch 32.444278955459595 sec
...
training      | Epoch 10 Batch 300 Loss 0.1663
training      | Epoch 10 Loss 0.1383
training      | Time taken for 1 epoch 35.29659080505371 sec
training      | Checkpoints saved in /checkpoints/eng-fra
training      | Requested translator service to reload its model, response status: 200.

The training service runs continuously and triggers the model reload on the translation service every 10 cycles (epochs). Once the translation service has been notified at least once, we can stop and remove the training service and release the GPU instances at any time we choose. 

We can easily do this by removing the service from the Compose file:

services:
 translator:
   image: myhubuser/gpudemo
   build: backend
   volumes:
     - models:/checkpoints
   ports:
     - 5000:5000
volumes:
 models:

and then run docker compose up again to update the running application. This will apply the changes and remove the training service.

$ docker compose up      
[+] Running 0/0
 ⠋ gpu                  UpdateInProgress User Initiated    
 ⠋ LoadBalancer         CreateComplete      
 ⠋ ModelsAccessPoint    CreateComplete     
...
 ⠋ Cluster              CreateComplete     
 ⠋ TranslatorService    CreateComplete   

We can list the services running to see the training service has been removed and we only have the translator one:

$ docker compose ps
NAME                                        SERVICE             STATE               PORTS
task/gpu/78e1d482a70e47549237ada1c20cc04d   translator          Running             gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000->5000/tcp

Query the translator:

$ curl -d "text=hello" gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000/
salut ! 

To remove the application from Amazon ECS run:

$ docker compose down

Summary

We discussed how to setup a resource-intensive ML application to make it easily deployable in different environments with Docker Compose. We have exercised how to define the use of GPUs in a Compose file and how to deploy it on Amazon ECS.

Resources:

]]>
How Developers Can Get Started with Python and Docker https://www.docker.com/blog/how-developers-can-get-started-with-python-and-docker/ Fri, 12 Feb 2021 18:43:06 +0000 https://www.docker.com/blog/?p=27528

Screen Shot 2021 02 12 at 10.26.43 AM

Python started in 1991 with humble beginnings focusing on helping “automate the boring stuff.” But over the past few years, we’ve seen Python grow in popularity and become extremely useful not only for scripting but for building modern web applications, machine learning and data science. 

The TIOBE Index for February has Python ranked at number 3 on the list. Python has also been in the top 8 rank programming languages for the past 7 years. With such a popular and powerful programming language comes a vibrate and large community.

To that end, we are excited to announce that we are releasing a series of programming language-specific guides to help developers go from discovering the basics of Docker to delivering your images into a production environment and more.

The first in our series is a focus on the Python development ecosystem. We have created a series of tutorials, how-tos, and guides focused on the Python community with much more coming in the future. 

We are extremely excited to help Python developers become experts at developing and delivering the next generation of applications using the Docker platform. Below you will find a list of resources and our Python language-specific guide to help move you from understanding to application all the while using the language you love: Python.

As I mentioned at the beginning of this post, we are truly developer obsessed and we would love to hear your feedback. We hope that this upcoming Python series is helpful and stay tuned for more content! Please feel free to reach out to me personally either by email, Twitter or in our community slack channel.

]]>