Skip to content

Introduction to Docker - Part 2: Volumes

This guide builds on Docker fundamentals by demonstrating how to use Volumes and Bind Mounts to extract and persist data generated by ephemeral containers.

What are Volumes?

Due to the ephemerality of containers, we need a way to save our data so they persist when a container stops. Volumes allow us to map a folder on our actual computer (host) to a folder inside the container.

Preliminaries

Let's assume that we have a very basic project structure, with a single python file called pipeline.py located inside the src/ folder, and a requirements.txt file in the root directory.

project/
├── src/
│   ├── pipeline.py
│   requirements.txt
└── dockerfile
pipeline.py is just a toy script that creates a DataFrame and prints it:
import pandas as pd
import os
from pathlib import Path

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
print(df)

# Find the absolute path of the pipeline.py file and go up one folder
PROJ_ROOT = Path(__file__).resolve().parents[1]
# Create a data folder in the root of the project
save_path = PROJ_ROOT / 'data'
os.makedirs(save_path, exist_ok=True)

# Save the dataframe to the data folder
save_name = save_path / "data.csv"
df.to_csv(save_name)
requirements.txt contains only the dependency of our code on pandas:
pandas==2.3.3
and the dockerfile is:
# We assume a base image to start
FROM python:3.14-slim

WORKDIR /app

# 1. Install dependencies first (for caching)
COPY requirements.txt .
RUN pip install -r requirements.txt

# 2. Copy the actual application code
COPY src/pipeline.py ./src/

# 3. specify the command to run on startup
CMD ["python", "src/pipeline.py"]

Data Location

Let's start by building the docker image with running docker build -t my-data-storage . from a terminal opened in our project folder. Next, let's run a container with docker run my-data-storage. We see the dataframe getting printed in the terminal due to the print(df) command in pipeline.py. Where's the .csv file with our data though?

In Part 1, we saw that containers are ephemeral, meaning that they are deleted when they are done running. As data.csv was created inside the container it gets deleted just like the rest of the code after the container finishes its job. Hence, we need to find a way to make the data persist.

Local Data Location

First though, let's find the location within the container that the data.csv is saved while the container is still running. In pipeline.py we create a folder called data in the root of our project to save data.csv. Hence, it's relative path is ./data/.

Container Data Location

Now let's see where the data.csv is saved inside the container. Since we have specified the working directory to be /app/ using WORKDIR /app, python will save data.csv to /app/data/.

The -v Flag

As we already mentioned, if we run docker run my-data-storage, the script will successfully create that file inside the container's filesystem. But the moment that container stops and is removed, your CSV vanishes with it.

To save that file to your actual computer (the host), we need to punch a hole through the container wall using a Volume.

We do this using the -v flag in the docker run command. The syntax looks like this:

docker run -v Host_Path:Container_Path Image_Name 
In our case, we saw earlier that python will save data.csv on /app/data/ on the container, hence we replace Container_Path with /app/data/.

Regarding the Host_Path, there are two ways to go:

  • Bind Mount: We tell docker to use a specific folder on our host machine. We could for example use the relative path ./output or the absolute path "C:\Users\User_Name\Documents\Data\".
  • Named Volume: We need docker to keep the data in a safe place but without telling it exactly where to store it. We could for example use the name my_data. Generally we don't touch these files directly, rather we let docker handle them.

Bind Mount is the best method for development, because we can see the files on our computer instantly. Named Volumes are better for databases or production data where we don't need to look at the files manually.

Let's go with the Bind Mount method and replace Host_Path with ./data. So, our run command will be

docker run -v ./data:/app/data/ my-data-storage 

Then, we will see a folder called data with a file called data.csv appear in the directory where we run the command.