wrk and wrk2 HTTP Benchmarking Complete Guide for 2026
Master wrk and wrk2 for HTTP microbenchmarks. Cover command flags, Lua scripts, coordinated omission, HDR histograms, and how wrk fits with k6 and JMeter.
wrk and wrk2 HTTP Benchmarking Complete Guide for 2026
wrk is a C-based HTTP benchmarking tool that focuses on one thing: maximum throughput against a single endpoint. It is not a load testing tool in the user-journey sense. It does not parameterize from CSVs, it does not chain requests, it does not produce HTML reports. What it does do is generate hundreds of thousands of requests per second from a single laptop, with tighter timing accuracy than almost any other tool. For engineers working on hot paths, microservice performance, or proving that the load balancer can sustain a given RPS, wrk is the right tool.
wrk2, a fork by Gil Tene of Azul Systems, adds two essential features for production use: a steady arrival rate (open model) and HDR histogram output. Together wrk and wrk2 are the standard benchmarking pair for SREs and performance engineers who care about tail latencies and coordinated omission. This guide covers both in 2026. We walk through installation, command flags, Lua scripts, the coordinated omission problem and why wrk2 solves it, HDR histograms, and how to integrate wrk results into larger performance reports. For broader tooling see k6 vs JMeter and the skills directory.
Why wrk and wrk2
Three reasons SREs reach for wrk:
- Maximum throughput in a single binary. wrk can saturate a 10 Gbps NIC from a laptop. No other load tool gets this close to wire speed.
- Tight timing. wrk's event-loop architecture has minimal scheduling jitter. Latencies it measures are close to physical reality.
- Easy to script. Lua extensions hook into request generation and response handling for custom scenarios.
The trade-off is that wrk does only HTTP and does only one endpoint per run. For complex user journeys use k6 or Locust. For microbenchmarks, capacity tests, and "how many RPS can this single endpoint sustain" questions, wrk is unmatched.
| Tool | Use Case | Peak Single-Machine RPS |
|---|---|---|
| wrk | Maximum throughput | 500k+ |
| wrk2 | Constant arrival rate plus HDR | 200k+ |
| k6 | User journeys plus thresholds | 30k-40k |
| Locust | Python user journeys | 5k-10k per worker |
| JMeter | Multi-protocol enterprise | 5k-8k |
Installing wrk and wrk2
Both tools require building from source on Linux and macOS. They are not in default Ubuntu apt repos in 2026, but Homebrew packages exist for macOS.
# macOS
brew install wrk
# Ubuntu/Debian build from source
sudo apt install build-essential libssl-dev git
git clone https://github.com/wg/wrk.git
cd wrk
make
sudo mv wrk /usr/local/bin/
# wrk2 (similar build)
git clone https://github.com/giltene/wrk2.git
cd wrk2
make
sudo mv wrk /usr/local/bin/wrk2
Verify both:
wrk --version
wrk2 --version
Your First Benchmark
The simplest wrk invocation hits an endpoint with N threads, M open connections, for T seconds.
wrk -t12 -c400 -d30s --latency https://api.example.com/health
Flags:
-t12: 12 threads (typically one per CPU core)-c400: 400 concurrent open connections-d30s: 30 seconds duration--latency: print latency distribution
Output:
Running 30s test @ https://api.example.com/health
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 14.32ms 28.78ms 1.78s 96.45%
Req/Sec 2.42k 312.89 3.78k 78.23%
Latency Distribution
50% 10.43ms
75% 16.21ms
90% 25.87ms
99% 124.34ms
870234 requests in 30.00s, 142.32MB read
Requests/sec: 29008.34
Transfer/sec: 4.74MB
This is the canonical wrk output. The latency distribution at the bottom is what most engineers care about for tail latency analysis.
Thread and Connection Tuning
The thread count should typically match physical CPU cores or be slightly less. Each thread runs an independent event loop. Too few threads means a single core is overloaded; too many means context switching overhead.
Connections are distributed across threads. With -t12 -c400, each thread maintains ~33 open connections. For high-throughput tests against a load balancer, you typically want enough connections to keep all backend pools busy. The rule of thumb: more connections than expected concurrent requests, but not so many that they overwhelm the client OS.
# Light load
wrk -t4 -c50 -d30s URL
# Standard
wrk -t12 -c400 -d60s URL
# Heavy
wrk -t24 -c2000 -d120s URL
Watch netstat or ss during the run to confirm connections are stable. If the kernel runs out of ephemeral ports you will see ECONNREFUSED errors. Tune net.ipv4.ip_local_port_range if needed.
Lua Scripting
wrk's killer feature is Lua scripting. Hook into request and response lifecycles to customize.
-- script.lua
-- Called once per worker before benchmark starts
init = function(args)
local r = {}
for i, v in ipairs(args) do
r[i] = v
end
print("Init with args:", table.concat(r, ","))
end
-- Called for each request
request = function()
local headers = { ["Authorization"] = "Bearer " .. os.getenv("TOKEN") }
local body = '{"sku":"ABC-' .. math.random(1, 1000) .. '"}'
return wrk.format("POST", "/cart", headers, body)
end
-- Called after each response
response = function(status, headers, body)
if status ~= 200 then
print("Got non-200:", status, body)
end
end
-- Called once when benchmark ends
done = function(summary, latency, requests)
print("Total requests:", summary.requests)
print("p99 latency:", latency:percentile(99))
end
Run with -s:
wrk -t12 -c400 -d30s -s script.lua --latency https://api.example.com/
The script generates POST requests with random SKUs and a custom Authorization header. The Lua API gives you complete control over what each request looks like.
| Hook | When | Use Case |
|---|---|---|
| setup(thread) | Per-thread init | Per-thread state |
| init(args) | Per-thread, before benchmark | Read args, seed RNG |
| request() | Per request | Vary payload |
| response(s, h, b) | Per response | Count errors, capture |
| done(summary, latency, requests) | End of run | Custom output |
Coordinated Omission
The reason wrk2 exists. Standard wrk operates in closed-loop mode: when a request finishes, it sends another. If the server slows down, wrk sends fewer requests, and the latencies wrk reports are not what real users would see. The latencies measured don't include the time the request would have spent waiting in a queue if there had been queuing. This is Gil Tene's "coordinated omission" problem.
wrk2 fixes it. You specify a target rate with -R, and wrk2 generates requests at that constant rate regardless of server response time. If the server slows, requests pile up, and the latencies wrk2 reports include the queueing time. This is what your real users experience.
# Standard wrk: pace by completion time (broken for tail latency)
wrk -t12 -c400 -d30s URL
# wrk2: constant 10,000 RPS regardless of response time (correct)
wrk2 -t12 -c400 -d30s -R10000 URL
If you publish tail latency numbers without using wrk2 (or another open-loop tool), your numbers are systematically too good. Production traffic does not slow down because the server slowed down; it keeps arriving.
HDR Histograms
wrk2 outputs HDR histograms (High Dynamic Range histograms) instead of just percentiles. HDR histograms are accurate across many orders of magnitude of latency without losing resolution at the tail.
wrk2 -t12 -c400 -d30s -R10000 --latency URL > result.txt
# Detailed latency distribution
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 12.34ms
75.000% 23.45ms
90.000% 45.67ms
99.000% 142.34ms
99.900% 543.21ms
99.990% 1.34s
99.999% 4.56s
These detailed percentiles are essential for SLO work. If your SLO says "99.9% of requests faster than 500 ms", you need 99.9th percentile measured accurately. wrk2's HDR histogram delivers that.
For analysis you can also export raw HDR histogram data and merge across runs with the HdrHistogram tools.
Real Examples
A few useful patterns we apply at qaskills.sh:
# 1. Baseline capacity probe: how much can it take?
wrk2 -t12 -c1000 -d120s -R100000 https://api.example.com/health
# 2. SLO validation: does it meet p99 < 500ms at expected load?
wrk2 -t12 -c400 -d300s -R5000 --latency https://api.example.com/search
# 3. Tail latency comparison after deploy
wrk2 -t12 -c400 -d60s -R5000 --latency https://canary.example.com/ > canary.txt
wrk2 -t12 -c400 -d60s -R5000 --latency https://prod.example.com/ > prod.txt
diff canary.txt prod.txt
These commands replace days of manual capacity planning with minutes of wall time per test.
Custom Scenarios with Lua
For non-trivial scenarios where vanilla wrk is too limited but a full k6 script is overkill, Lua bridges the gap.
-- multi-endpoint.lua
local counter = 0
local paths = {"/api/products", "/api/categories", "/api/search?q=laptop"}
request = function()
counter = counter + 1
local path = paths[counter % #paths + 1]
return wrk.format("GET", path)
end
response = function(status, headers, body)
if status >= 500 then
print("5xx error on", wrk.path)
end
end
The script rotates through three paths in round-robin. Most production wrk benchmarks have a Lua script of 20-50 lines.
Comparison with k6
When to choose wrk over k6 and vice versa:
| Scenario | Use wrk | Use k6 |
|---|---|---|
| Max throughput single endpoint | Yes | No |
| User journey with login plus cart | No | Yes |
| HDR histogram tail latency | Yes (wrk2) | k6 has limited HDR |
| Threshold-based CI pass/fail | Awkward | Native |
| Distributed cloud runs | No | Yes (k6 Cloud) |
| Lua scripting | Yes | No (k6 uses JS) |
| WebSocket | Limited | Yes |
For one-shot capacity tests, wrk or wrk2. For ongoing CI-integrated load tests with multiple scenarios and thresholds, k6.
Integrating into Reports
wrk output is plain text. To incorporate into broader perf reports, parse the output:
#!/bin/bash
# wrk-to-csv.sh
OUTPUT=$(wrk2 -t12 -c400 -d60s -R5000 --latency $1 2>&1)
P50=$(echo "$OUTPUT" | grep -oP '50\.000%\s+\K[\d.]+(?=ms)')
P99=$(echo "$OUTPUT" | grep -oP '99\.000%\s+\K[\d.]+(?=ms)')
P999=$(echo "$OUTPUT" | grep -oP '99\.900%\s+\K[\d.]+(?=ms)')
RPS=$(echo "$OUTPUT" | grep -oP 'Requests/sec:\s+\K[\d.]+')
echo "url,p50,p99,p999,rps"
echo "$1,$P50,$P99,$P999,$RPS"
Push the CSV into your performance database. Compare runs across releases to track regressions.
Common Mistakes
Five mistakes we see repeatedly:
- Using standard wrk instead of wrk2 for SLO work. Tail latencies are wrong due to coordinated omission. Always use wrk2 for SLO validation.
- Not enough connections.
-c50against a service expecting thousands of concurrent connections doesn't test what you think. Match-cto expected real load. - Running wrk on the same machine as the server. Client and server compete for CPU. Run from a separate machine, ideally in the same VPC.
- Forgetting
--latency. Without it you get only mean and stdev, not percentiles. - No warm-up. First N seconds of any benchmark include cold starts, JIT compilation, cache misses. Run for 60+ seconds or use a warm-up phase.
Conclusion
wrk and wrk2 are the right tools for single-endpoint throughput benchmarks and tail latency analysis. They are not a replacement for k6 or Locust for user-journey load testing, but they complement those tools by answering the "how fast can it go" and "what does the tail look like" questions with exceptional precision.
If you are doing performance work in 2026, install both and reach for them whenever a question is single-endpoint focused. Read k6 vs JMeter for tool comparisons and browse the skills directory for load testing AI agent skills.