Testing

2026-06-15

Pairwise & Combinatorial Testing Guide with PICT (2026)

Learn pairwise (all-pairs) and combinatorial testing with worked examples, the PICT tool, constraints, and how to cut test cases without losing coverage.

Pairwise & Combinatorial Testing: A Practical Guide with PICT

Pairwise testing (also called all-pairs testing) is a combinatorial test-design technique that generates a small set of test cases covering every possible pair of input values at least once, instead of testing the full Cartesian product of all parameters. It works because empirical studies show most defects are triggered by a single parameter or the interaction of just two parameters, so covering all 2-way combinations finds the large majority of interaction bugs while shrinking the test suite by orders of magnitude. Tools like Microsoft PICT generate these sets automatically.

This guide explains the math, walks through worked examples by hand, shows how to use PICT (including constraints and higher-order coverage), and gives you a decision framework for when pairwise is the right tool.

Why the full combination explodes

Consider a checkout form with these parameters:

Parameter	Values	Count
Payment	Card, PayPal, UPI, Wallet	4
Country	US, UK, IN, DE	4
Currency	USD, GBP, INR, EUR	4
Device	Desktop, Mobile, Tablet	3
Logged in	Yes, No	2

The exhaustive (full Cartesian) suite is 4 x 4 x 4 x 3 x 2 = 384 test cases. Add one more 4-value parameter and you are at 1,536. This is the combinatorial explosion: test count grows multiplicatively with each parameter. Nobody runs 384 manual checkout tests, so testers ad-hoc-pick a handful and quietly leave most interactions untested.

Pairwise testing replaces "test everything" with a precise guarantee: every pair of values from any two parameters appears together in at least one test case. For the table above, a pairwise set needs only about 16 test cases instead of 384 — a 96% reduction — while still exercising every two-way interaction.

The empirical basis (why 2-way is usually enough)

NIST studied fault distributions across multiple software domains and found that the cumulative percentage of faults triggered rises steeply with interaction strength and then flattens:

Interaction strength	Approx. cumulative faults detected
1-way (single value)	~20-70%
2-way (pairwise)	~70-95%
3-way	~90-98%
4-way to 6-way	approaching 100%

The exact numbers vary by domain, but the shape is consistent: going from 1-way to 2-way captures a huge jump in defects, and beyond 3-way you get diminishing returns for a steadily larger suite. Pairwise (t=2) is the standard default; raise the strength (t=3, t=4) only for high-risk components.

Worked example by hand: 3 parameters

Combinatorial generation is normally done by a tool, but doing one small case by hand builds intuition. Take three parameters with these values:

Browser: Chrome, Firefox, Safari
OS: Windows, macOS
Plan: Free, Pro

Full combination = 3 x 2 x 2 = 12 tests. We want every pair covered. The pairs we must hit include (Chrome, Windows), (Chrome, macOS), (Firefox, Windows), ... for Browser x OS; plus all Browser x Plan and all OS x Plan pairs.

A valid pairwise set is just 6 tests:

#	Browser	OS	Plan
1	Chrome	Windows	Free
2	Chrome	macOS	Pro
3	Firefox	Windows	Pro
4	Firefox	macOS	Free
5	Safari	Windows	Free
6	Safari	macOS	Pro

Verify a few pairs: (Firefox, macOS) is in row 4. (Safari, Pro) is in row 6. (macOS, Free) is in row 4. (Windows, Pro) is in row 3. Every Browser x OS, Browser x Plan, and OS x Plan pair appears at least once — yet we dropped half the exhaustive suite. With more parameters the savings become dramatic, which is exactly why you reach for a generator.

Using PICT (Pairwise Independent Combinatorial Testing tool)

PICT is Microsoft's open-source command-line combinatorial test generator. You give it a plain-text model file; it prints a tab-separated table of test cases to stdout.

Install

PICT ships as a single binary. On macOS you can install via Homebrew; on other platforms you build from source or grab a release:

# macOS
brew install pict

# Or build from source (any platform with a C++ compiler + CMake)
git clone https://github.com/microsoft/pict.git
cd pict
cmake -S . -B build
cmake --build build
# binary lands in ./build/cli/pict

Verify it runs:

pict /?

The model file

A model is one parameter per line: Name: value1, value2, value3. Create checkout.txt:

Payment:   Card, PayPal, UPI, Wallet
Country:   US, UK, IN, DE
Currency:  USD, GBP, INR, EUR
Device:    Desktop, Mobile, Tablet
LoggedIn:  Yes, No

Generate the pairwise suite:

pict checkout.txt

PICT prints a header row plus the generated cases (tab-separated). The output for this model is roughly 16-20 rows depending on PICT's seeding — every two-way pair covered, versus 384 exhaustive. Example (abbreviated):

Payment  Country  Currency  Device   LoggedIn
Wallet   IN       EUR       Tablet   Yes
Card     UK       INR       Mobile   No
PayPal   US       GBP       Desktop  Yes
UPI      DE       USD       Tablet   No
...

Higher-order coverage (t=3, t=4)

The default order is 2 (pairwise). For higher-risk modules, raise it with /o:

pict checkout.txt /o:3      # all 3-way combinations

Three-way coverage of this model produces more cases (dozens) but still far below the full 384. Use it only where interaction bugs are costly — payment + fraud + region logic, for instance.

Constraints: excluding impossible combinations

Real systems have illegal combinations. You should not test Currency=INR with Country=US. PICT supports constraints using IF ... THEN blocks. Append to the model:

IF [Country] = "US"  THEN [Currency] = "USD";
IF [Country] = "UK"  THEN [Currency] = "GBP";
IF [Country] = "IN"  THEN [Currency] = "INR";
IF [Country] = "DE"  THEN [Currency] = "EUR";

Now PICT will never emit a row where Country and Currency contradict, while still covering all valid pairs. Constraints are the single most important PICT feature for production use — without them you waste a third of your suite on combinations QA will reject as invalid.

Weighting and seeding

You can bias PICT toward important values by weighting them: Payment: Card (10), PayPal, UPI, Wallet makes Card appear more often. You can also pin known-good rows with a seed file (/e:seed.txt) so an existing regression case is always included and PICT fills the gaps around it. This is handy when you want to keep last release's golden cases and only add what is newly required.

A realistic end-to-end workflow

Here is how teams fold pairwise into an automated suite. First, generate cases and save them:

pict checkout.txt > cases.tsv

Then convert to a data structure your test runner can parametrize over. A small Python adapter turns the TSV into rows for pytest.mark.parametrize:

import csv

def load_pict_cases(path):
    with open(path) as f:
        reader = csv.DictReader(f, delimiter="\t")
        return [dict(row) for row in reader]

# In your test module
import pytest
CASES = load_pict_cases("cases.tsv")

@pytest.mark.parametrize("case", CASES)
def test_checkout(case):
    result = run_checkout(
        payment=case["Payment"],
        country=case["Country"],
        currency=case["Currency"],
        device=case["Device"],
        logged_in=case["LoggedIn"] == "Yes",
    )
    assert result.status == "ok"

Now your suite scales: edit the model, regenerate, and the parametrized tests update automatically. For UI-level coverage you can drive the same case list through a Playwright or Selenium loop — see the broader QA skills directory for browser-automation skills that pair well with combinatorial data.

CI integration

Treat the model file as source-controlled and regenerate in CI so the suite never drifts from the model:

# .github/workflows/combinatorial.yml
name: combinatorial-tests
on: [push, pull_request]
jobs:
  pairwise:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build PICT
        run: |
          git clone --depth 1 https://github.com/microsoft/pict.git
          cmake -S pict -B pict/build && cmake --build pict/build
      - name: Generate cases
        run: ./pict/build/cli/pict checkout.txt > cases.tsv
      - name: Run tests
        run: pytest -q

Common mistakes and troubleshooting

Treating pairwise as exhaustive. Pairwise guarantees all 2-way pairs, not all 3-way or 4-way interactions. A bug that needs three specific values to line up can slip through. If you have a known three-parameter interaction (auth + role + feature flag), raise that subset to /o:3 or add an explicit case.

Forgetting constraints. Without IF/THEN, PICT happily generates invalid rows (US + INR). Your tests then fail for the wrong reason, eroding trust in the suite. Model every real-world constraint.

Too many values per parameter. Pairwise reduces the multiplicative blow-up, but the case count is still driven by the two largest parameters. Combine equivalent values first using boundary value analysis and equivalence partitioning so each parameter has a tight set of representative values, then run pairwise on top.

Non-determinism in output. PICT's default output can vary between runs due to randomized seeding. Pin a seed (or commit the generated cases.tsv) if you need reproducible CI runs and stable test IDs.

Ignoring negative values. Combinatorial tools cover the values you list. If you only list valid inputs, you only test valid combinations. Add representative invalid values (and constrain them so they are tested deliberately, not mixed randomly).

How pairwise compares to other techniques

Technique	Guarantees	Best for
Exhaustive	Every combination	Tiny parameter spaces, safety-critical logic
Pairwise (t=2)	All 2-way pairs	Default for multi-parameter configs
Higher-order (t=3+)	All t-way combos	High-risk interaction-heavy modules
Equivalence partitioning	One value per class	Reducing values before combining
Boundary value analysis	Edges of each range	Off-by-one and range bugs

These techniques compose. Use equivalence partitioning and boundary analysis to choose which values go into each parameter, then use pairwise/combinatorial generation to choose which combinations to test. You can compare related test-design and automation approaches side by side on the comparison hub.

Frequently Asked Questions

What is the difference between pairwise and combinatorial testing?

Pairwise testing is the specific case of combinatorial testing where the interaction strength is two (t=2), meaning every pair of parameter values is covered. Combinatorial testing is the general family that also includes 3-way, 4-way, and higher coverage. People often use "pairwise" and "all-pairs" interchangeably, and "combinatorial" when they mean any t-way strength.

How many test cases does pairwise testing actually save?

It depends on the model, but the reduction is often 90% or more for systems with five or more parameters. A 4x4x4x3x2 model drops from 384 exhaustive cases to roughly 16-20 pairwise cases. The savings grow as you add parameters, because the exhaustive count multiplies while the pairwise count grows much more slowly.

Is PICT free and what platforms does it run on?

Yes, PICT is open-source under the MIT license and maintained by Microsoft on GitHub. It is a small C++ command-line tool that builds on Windows, macOS, and Linux with a standard compiler and CMake. There is no server or account required; you run it locally or in CI.

Can pairwise testing miss bugs?

Yes. Pairwise only guarantees every two-way interaction is covered, so a defect that requires three or more specific values to coincide can be missed. Mitigate this by raising the interaction strength for high-risk parameter subsets, adding targeted cases for known multi-way interactions, and combining pairwise with risk-based test selection.

How do constraints work in PICT?

Constraints are IF/THEN rules in the model file that tell PICT which combinations are invalid or conditionally required. For example, "IF [Country] = "US" THEN [Currency] = "USD";" prevents PICT from ever pairing a US country with a non-USD currency. PICT still covers all valid pairs while excluding the impossible ones, so your suite stays realistic.

When should I use 3-way instead of pairwise?

Use 3-way (t=3) coverage when a module has known interactions among three parameters that are individually plausible but jointly risky, such as authentication state, user role, and feature flag. Three-way coverage produces more test cases than pairwise but still far fewer than exhaustive. Reserve it for the highest-risk components rather than applying it across the whole system.