Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmark

How to reproduce the LoadPilot benchmark results.

What is tested

Four scenarios:

  • Precision — all tools target 500 RPS for 30s. Measures how accurately each tool holds the target RPS and what latency overhead the load generator itself adds.
  • Max throughput — each tool runs at maximum capacity for 30s. Measures the throughput ceiling on a single machine.
  • PyO3 precision (LoadPilot only) — measures the cost of enabling Python callbacks (on_start, check_*) at 500 RPS.
  • PyO3 max throughput (LoadPilot only) — measures the ceiling of different PyO3 architectures and optimisations.

Setup

Requirements:

  • Docker with Compose v2 (docker compose, not docker-compose)
  • Python 3.x (for the report script)

Tools under test:

  • LoadPilot (static mode and PyO3 mode)
  • k6 v0.55+
  • Locust v2.x

Target server: Rust/axum echo server built and run in Docker. Endpoints:

  • POST /auth/login{"access_token": "tok"} (used by on_start)
  • GET /api/user{"id": 1, "name": "bench"} (main task endpoint)
  • GET /health{"status": "ok"}

All containers share the same Docker bridge network. Tools run sequentially with a 10s cooldown between runs.

Running

cd bench
./run.sh

This will:

  1. Build the target server and LoadPilot Docker images
  2. Start the target server
  3. Run each tool sequentially (8 runs total)
  4. Generate results/report.html

First run takes longer — Rust compilation is cached after that.

To run a single profile:

cd bench

# Precision
docker compose --profile loadpilot-precision run --rm loadpilot-precision
docker compose --profile k6-precision       run --rm k6-precision
docker compose --profile locust-precision   run --rm locust-precision

# Max throughput
docker compose --profile loadpilot-max      run --rm loadpilot-max
docker compose --profile k6-max            run --rm k6-max
docker compose --profile locust-max        run --rm locust-max

# PyO3 — precision
docker compose --profile loadpilot-pyo3-onstart run --rm loadpilot-pyo3-onstart
docker compose --profile loadpilot-pyo3-full    run --rm loadpilot-pyo3-full

# PyO3 — max throughput
docker compose --profile loadpilot-pyo3-max-onstart run --rm loadpilot-pyo3-max-onstart
docker compose --profile loadpilot-pyo3-max-full    run --rm loadpilot-pyo3-max-full

# PyO3 — batch API
docker compose --profile loadpilot-pyo3-batch5 run --rm loadpilot-pyo3-batch5

# Regenerate report from existing results
python3 report.py

Results

Precision — 500 RPS target, 30s constant

ToolRPS actualp50p99ErrorsCPU avgCPU peakMem peak
LoadPilot (PyO3)4784ms15ms0%14%108%68 MB
k64918ms118ms0%129%140%59 MB
Locust498150ms1500ms0%88%119%85 MB

CPU % is relative to one core (200% = two cores fully busy). LoadPilot runs in PyO3 mode with on_start (login) and check_* (assertion per task) — a realistic scenario with Python callbacks.

LoadPilot and k6 hold the target accurately. Locust reaches the RPS but its Python/GIL scheduler adds significant latency (p99 ≥ 1500ms at only 500 RPS). LoadPilot uses 9× less CPU than k6 at the same load.

Max throughput — 30s constant, no artificial cap

ToolRPSp50p99ErrorsCPU avgCPU peakMem peak
LoadPilot (PyO3)220511ms38ms0%165%179%105 MB
k6179914ms175ms0%212%229%107 MB
Locust677100ms170ms0%117%122%50 MB

LoadPilot runs in PyO3 mode with on_start + check_*. It delivers 1.2× k6 and 3.3× Locust at max throughput. Per CPU: LoadPilot ≈ 13.4 RPS/core vs k6 ≈ 8.5 RPS/core — roughly 1.6× better CPU efficiency.

PyO3 precision — 500 RPS, on_start + optional check_*

ArchitectureRPS actualp50p99CPU avgMem peakNotes
Static (no callbacks)4993ms11ms24%43 MBRust only
+ on_start4862ms5ms77%74 MBlogin per VUser
+ on_start + check_*4784ms15ms14%68 MBassertion per task

Adding Python callbacks at 500 RPS has near-zero latency cost.

PyO3 max throughput — optimisation experiments

ApproachHTTP RPSp50p99CPU avgMem peakNotes
asyncio.run_until_complete1591historical baseline
coro.send(None) fast path228911ms37ms190%115 MBcurrent async task impl
sync def task248722ms67ms167%116 MBno asyncio overhead
async task + check_*220511ms38ms165%105 MBcheck_*(self, status_code, body)
client.batch(5)338514ms34ms147%79 MBpure Rust JoinSet
Static ceiling (no Python)349418ms537ms115%76 MBreference

client.batch(N) reaches 97% of static mode at batch size 5.

Methodology notes

Why Docker? Reproducible on any machine with Docker. The bridge network adds a small fixed overhead equally for all tools, so relative comparisons remain valid.

Why sequential runs? Running tools simultaneously would saturate the target server and mix results. Sequential runs with a 10s cooldown give each tool a clean slate.

Resource measurement CPU and memory are sampled via docker stats --no-stream every 1 second. CPU % is relative to one core (100% = one core fully busy). Memory is peak RSS reported by the Docker cgroup. The target server is excluded.