Pairwise & Combinatorial Testing Guide with PICT (2026)
Learn pairwise (all-pairs) and combinatorial testing with worked examples, the PICT tool, constraints, and how to cut test cases without losing coverage.
Pairwise & Combinatorial Testing: A Practical Guide with PICT
Pairwise testing (also called all-pairs testing) is a combinatorial test-design technique that generates a small set of test cases covering every possible pair of input values at least once, instead of testing the full Cartesian product of all parameters. It works because empirical studies show most defects are triggered by a single parameter or the interaction of just two parameters, so covering all 2-way combinations finds the large majority of interaction bugs while shrinking the test suite by orders of magnitude. Tools like Microsoft PICT generate these sets automatically.
This guide explains the math, walks through worked examples by hand, shows how to use PICT (including constraints and higher-order coverage), and gives you a decision framework for when pairwise is the right tool.
Why the full combination explodes
Consider a checkout form with these parameters:
| Parameter | Values | Count |
|---|---|---|
| Payment | Card, PayPal, UPI, Wallet | 4 |
| Country | US, UK, IN, DE | 4 |
| Currency | USD, GBP, INR, EUR | 4 |
| Device | Desktop, Mobile, Tablet | 3 |
| Logged in | Yes, No | 2 |
The exhaustive (full Cartesian) suite is 4 x 4 x 4 x 3 x 2 = 384 test cases. Add one more 4-value parameter and you are at 1,536. This is the combinatorial explosion: test count grows multiplicatively with each parameter. Nobody runs 384 manual checkout tests, so testers ad-hoc-pick a handful and quietly leave most interactions untested.
Pairwise testing replaces "test everything" with a precise guarantee: every pair of values from any two parameters appears together in at least one test case. For the table above, a pairwise set needs only about 16 test cases instead of 384 — a 96% reduction — while still exercising every two-way interaction.
The empirical basis (why 2-way is usually enough)
NIST studied fault distributions across multiple software domains and found that the cumulative percentage of faults triggered rises steeply with interaction strength and then flattens:
| Interaction strength | Approx. cumulative faults detected |
|---|---|
| 1-way (single value) | ~20-70% |
| 2-way (pairwise) | ~70-95% |
| 3-way | ~90-98% |
| 4-way to 6-way | approaching 100% |
The exact numbers vary by domain, but the shape is consistent: going from 1-way to 2-way captures a huge jump in defects, and beyond 3-way you get diminishing returns for a steadily larger suite. Pairwise (t=2) is the standard default; raise the strength (t=3, t=4) only for high-risk components.
Worked example by hand: 3 parameters
Combinatorial generation is normally done by a tool, but doing one small case by hand builds intuition. Take three parameters with these values:
- Browser: Chrome, Firefox, Safari
- OS: Windows, macOS
- Plan: Free, Pro
Full combination = 3 x 2 x 2 = 12 tests. We want every pair covered. The pairs we must hit include (Chrome, Windows), (Chrome, macOS), (Firefox, Windows), ... for Browser x OS; plus all Browser x Plan and all OS x Plan pairs.
A valid pairwise set is just 6 tests:
| # | Browser | OS | Plan |
|---|---|---|---|
| 1 | Chrome | Windows | Free |
| 2 | Chrome | macOS | Pro |
| 3 | Firefox | Windows | Pro |
| 4 | Firefox | macOS | Free |
| 5 | Safari | Windows | Free |
| 6 | Safari | macOS | Pro |
Verify a few pairs: (Firefox, macOS) is in row 4. (Safari, Pro) is in row 6. (macOS, Free) is in row 4. (Windows, Pro) is in row 3. Every Browser x OS, Browser x Plan, and OS x Plan pair appears at least once — yet we dropped half the exhaustive suite. With more parameters the savings become dramatic, which is exactly why you reach for a generator.
Using PICT (Pairwise Independent Combinatorial Testing tool)
PICT is Microsoft's open-source command-line combinatorial test generator. You give it a plain-text model file; it prints a tab-separated table of test cases to stdout.
Install
PICT ships as a single binary. On macOS you can install via Homebrew; on other platforms you build from source or grab a release:
# macOS
brew install pict
# Or build from source (any platform with a C++ compiler + CMake)
git clone https://github.com/microsoft/pict.git
cd pict
cmake -S . -B build
cmake --build build
# binary lands in ./build/cli/pict
Verify it runs:
pict /?
The model file
A model is one parameter per line: Name: value1, value2, value3. Create checkout.txt:
Payment: Card, PayPal, UPI, Wallet
Country: US, UK, IN, DE
Currency: USD, GBP, INR, EUR
Device: Desktop, Mobile, Tablet
LoggedIn: Yes, No
Generate the pairwise suite:
pict checkout.txt
PICT prints a header row plus the generated cases (tab-separated). The output for this model is roughly 16-20 rows depending on PICT's seeding — every two-way pair covered, versus 384 exhaustive. Example (abbreviated):
Payment Country Currency Device LoggedIn
Wallet IN EUR Tablet Yes
Card UK INR Mobile No
PayPal US GBP Desktop Yes
UPI DE USD Tablet No
...
Higher-order coverage (t=3, t=4)
The default order is 2 (pairwise). For higher-risk modules, raise it with /o:
pict checkout.txt /o:3 # all 3-way combinations
Three-way coverage of this model produces more cases (dozens) but still far below the full 384. Use it only where interaction bugs are costly — payment + fraud + region logic, for instance.
Constraints: excluding impossible combinations
Real systems have illegal combinations. You should not test Currency=INR with Country=US. PICT supports constraints using IF ... THEN blocks. Append to the model:
IF [Country] = "US" THEN [Currency] = "USD";
IF [Country] = "UK" THEN [Currency] = "GBP";
IF [Country] = "IN" THEN [Currency] = "INR";
IF [Country] = "DE" THEN [Currency] = "EUR";
Now PICT will never emit a row where Country and Currency contradict, while still covering all valid pairs. Constraints are the single most important PICT feature for production use — without them you waste a third of your suite on combinations QA will reject as invalid.
Weighting and seeding
You can bias PICT toward important values by weighting them: Payment: Card (10), PayPal, UPI, Wallet makes Card appear more often. You can also pin known-good rows with a seed file (/e:seed.txt) so an existing regression case is always included and PICT fills the gaps around it. This is handy when you want to keep last release's golden cases and only add what is newly required.
A realistic end-to-end workflow
Here is how teams fold pairwise into an automated suite. First, generate cases and save them:
pict checkout.txt > cases.tsv
Then convert to a data structure your test runner can parametrize over. A small Python adapter turns the TSV into rows for pytest.mark.parametrize:
import csv
def load_pict_cases(path):
with open(path) as f:
reader = csv.DictReader(f, delimiter="\t")
return [dict(row) for row in reader]
# In your test module
import pytest
CASES = load_pict_cases("cases.tsv")
@pytest.mark.parametrize("case", CASES)
def test_checkout(case):
result = run_checkout(
payment=case["Payment"],
country=case["Country"],
currency=case["Currency"],
device=case["Device"],
logged_in=case["LoggedIn"] == "Yes",
)
assert result.status == "ok"
Now your suite scales: edit the model, regenerate, and the parametrized tests update automatically. For UI-level coverage you can drive the same case list through a Playwright or Selenium loop — see the broader QA skills directory for browser-automation skills that pair well with combinatorial data.
CI integration
Treat the model file as source-controlled and regenerate in CI so the suite never drifts from the model:
# .github/workflows/combinatorial.yml
name: combinatorial-tests
on: [push, pull_request]
jobs:
pairwise:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build PICT
run: |
git clone --depth 1 https://github.com/microsoft/pict.git
cmake -S pict -B pict/build && cmake --build pict/build
- name: Generate cases
run: ./pict/build/cli/pict checkout.txt > cases.tsv
- name: Run tests
run: pytest -q
Common mistakes and troubleshooting
Treating pairwise as exhaustive. Pairwise guarantees all 2-way pairs, not all 3-way or 4-way interactions. A bug that needs three specific values to line up can slip through. If you have a known three-parameter interaction (auth + role + feature flag), raise that subset to /o:3 or add an explicit case.
Forgetting constraints. Without IF/THEN, PICT happily generates invalid rows (US + INR). Your tests then fail for the wrong reason, eroding trust in the suite. Model every real-world constraint.
Too many values per parameter. Pairwise reduces the multiplicative blow-up, but the case count is still driven by the two largest parameters. Combine equivalent values first using boundary value analysis and equivalence partitioning so each parameter has a tight set of representative values, then run pairwise on top.
Non-determinism in output. PICT's default output can vary between runs due to randomized seeding. Pin a seed (or commit the generated cases.tsv) if you need reproducible CI runs and stable test IDs.
Ignoring negative values. Combinatorial tools cover the values you list. If you only list valid inputs, you only test valid combinations. Add representative invalid values (and constrain them so they are tested deliberately, not mixed randomly).
How pairwise compares to other techniques
| Technique | Guarantees | Best for |
|---|---|---|
| Exhaustive | Every combination | Tiny parameter spaces, safety-critical logic |
| Pairwise (t=2) | All 2-way pairs | Default for multi-parameter configs |
| Higher-order (t=3+) | All t-way combos | High-risk interaction-heavy modules |
| Equivalence partitioning | One value per class | Reducing values before combining |
| Boundary value analysis | Edges of each range | Off-by-one and range bugs |
These techniques compose. Use equivalence partitioning and boundary analysis to choose which values go into each parameter, then use pairwise/combinatorial generation to choose which combinations to test. You can compare related test-design and automation approaches side by side on the comparison hub.
Frequently Asked Questions
What is the difference between pairwise and combinatorial testing?
Pairwise testing is the specific case of combinatorial testing where the interaction strength is two (t=2), meaning every pair of parameter values is covered. Combinatorial testing is the general family that also includes 3-way, 4-way, and higher coverage. People often use "pairwise" and "all-pairs" interchangeably, and "combinatorial" when they mean any t-way strength.
How many test cases does pairwise testing actually save?
It depends on the model, but the reduction is often 90% or more for systems with five or more parameters. A 4x4x4x3x2 model drops from 384 exhaustive cases to roughly 16-20 pairwise cases. The savings grow as you add parameters, because the exhaustive count multiplies while the pairwise count grows much more slowly.
Is PICT free and what platforms does it run on?
Yes, PICT is open-source under the MIT license and maintained by Microsoft on GitHub. It is a small C++ command-line tool that builds on Windows, macOS, and Linux with a standard compiler and CMake. There is no server or account required; you run it locally or in CI.
Can pairwise testing miss bugs?
Yes. Pairwise only guarantees every two-way interaction is covered, so a defect that requires three or more specific values to coincide can be missed. Mitigate this by raising the interaction strength for high-risk parameter subsets, adding targeted cases for known multi-way interactions, and combining pairwise with risk-based test selection.
How do constraints work in PICT?
Constraints are IF/THEN rules in the model file that tell PICT which combinations are invalid or conditionally required. For example, "IF [Country] = "US" THEN [Currency] = "USD";" prevents PICT from ever pairing a US country with a non-USD currency. PICT still covers all valid pairs while excluding the impossible ones, so your suite stays realistic.
When should I use 3-way instead of pairwise?
Use 3-way (t=3) coverage when a module has known interactions among three parameters that are individually plausible but jointly risky, such as authentication state, user role, and feature flag. Three-way coverage produces more test cases than pairwise but still far fewer than exhaustive. Reserve it for the highest-risk components rather than applying it across the whole system.