Running Python in Kestra
Lets explore three strategies for running Python in Kestra (runtime installation, custom Docker images, and custom Worker images) which help us balance speed, isolation, and maintenance.
In Kestra, Python is typically executed using the io.kestra.plugin.scripts.python.Script task. By default, Kestra spins up a Docker container (e.g., python:3.11-slim) for every task, runs our code, and then destroys the container. This ensures isolation and reproducibility.
Additionally, Kestra injects a small Python library (kestra) into the container automatically, allowing our script to talk back to the orchestration engine (sending outputs, metrics, etc.).
Hello World
Let's start with a quick example:
id: python_hello_world
namespace: company.team
tasks:
- id: my_python_task
type: io.kestra.plugin.scripts.python.Script
containerImage: python:3.11-slim
script: |
import sys
print(f"Hello from Python {sys.version}")
We created a new flow with the id python_hello_world and with the namespace company.team.
Then, we defined a task called my_python_task with type io.kestra.plugin.scripts.python.Script. Let's see some key properties of this task:
script: It allows us to execute Python script. It works as if we were writing code in our favorite IDE (not so conveniently, but more to that later).containerImage: The specific Python docker container. We can use any of the official Python Docker images, or even define a custom made Python image.
Managing Dependencies
How can we import Python packages in our code? There three main ways to go about it.
beforeCommands
We can use the beforeCommands property to install packages before our script runs. For example:
tasks:
- id: my_python_task
type: io.kestra.plugin.scripts.python.Script
containerImage: python:3.11-slim
beforeCommands:
- pip install pandas
script: |
import pandas as pd
df = pd.DataFrame({'Name': ['Tom', 'Nick'], 'Age': [20, 21]})
print(df)
Running the code like this, will result in a reinstaltion of our packages every time we execute the flow, which naturally causes the execution time to be high.
Pre-build Python image
Alternatively, we could provide Kestra with a Python image which already has our required packages.
We can do that by allowing Kestra to take a look at the images residing in our local computer. To do that, and assuming we run Kestra through some docker-compose.yml, we need to add the following to the aforementioned file:
services:
kestra:
image: kestra/kestra:latest
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /tmp/kestra-wd:/tmp/kestra-wd
Then, all we have to do is make a dockerfile like this:
FROM python:3.12-slim
RUN pip install --no-cache-dir pandas
docker build python-with-pandas:latest .
Then our task will look like this:
tasks:
- id: analyze_data
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
image: python-with-pandas:latest
pullPolicy: IF_NOT_PRESENT
script: |
import pandas as pd
df = pd.DataFrame({'Name': ['Tom', 'Nick'], 'Age': [20, 21]})
print(df)
We used the docker image named python-with-pandas:latest we just built. Additionally, the pullPolicy property is set to IF_NOT_PRESENT which makes Kestra search for the image locally, and if it doesn't find it, then tries to download the specified image.
Kestra Image with Required Dependences
Finally, we could build Kestra with the required dependencies already installed. To do that, we first create a dockerfile:
# Start with the official Kestra image
FROM kestra/kestra:v1.2
# Switch to root to install system dependencies
USER root
# Install Python3, Pip, and Pandas
RUN apt-get update -y && \
apt-get install -y python3 python3-pip && \
pip3 install --no-cache-dir pandas
and then run Kestra trough the following docker-compose.yml:
volumes:
kestra_postgres_data:
driver: local
kestra_data:
driver: local
services:
kestra_postgres:
image: postgres:18
volumes:
- kestra_postgres_data:/var/lib/postgresql
environment:
POSTGRES_DB: kestra
POSTGRES_USER: kestra
POSTGRES_PASSWORD: k3str4
healthcheck:
test: ["CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
interval: 30s
timeout: 10s
retries: 10
kestra:
build:
context: .
user: "root"
command: server standalone
volumes:
- kestra_data:/app/storage
# Maps the Docker socket from our host (PC) to the Kestra container, allowing Kestra to see the containers in our PC.
- /var/run/docker.sock:/var/run/docker.sock
- /tmp/kestra-wd:/tmp/kestra-wd
environment:
KESTRA_CONFIGURATION: |
datasources:
postgres:
url: jdbc:postgresql://kestra_postgres:5432/kestra
driverClassName: org.postgresql.Driver
username: kestra
password: k3str4
kestra:
server:
basicAuth:
username: "admin@kestra.io" # it must be a valid email address
password: Admin1234
repository:
type: postgres
storage:
type: local
local:
basePath: "/app/storage"
queue:
type: postgres
tasks:
tmpDir:
path: /tmp/kestra-wd/tmp
url: http://localhost:8080/
ports:
- "8080:8080"
- "8081:8081"
depends_on:
kestra_postgres:
condition: service_started
The postgres database exists so that Kestra remembers our Flow definitions. Without it, every time we restarted the container we'd have to rewrite our flows.
Then our task will look like this:
tasks:
- id: analyze_data
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.core.runner.Process # Runs on the server/worker directly
script: |
import pandas as pd
df = pd.DataFrame({'Name': ['Tom', 'Nick'], 'Age': [20, 21]})
print(df)
Notice how the taskRunner is no longer of type Docker but is of type Process since we need Python to run on Kestra as a local procedure.
Comparatively, the Process Runner approach is faster since there's no container start up time. On the other hand, the Docker Runner approach is easier to maintain since we just need to update the task image if for example we need to update the python packages.