BDD

2026-05-12

BDD Test Data Management: Best Practices 2026

Q: AI agents for test data?

Yes -- the [QASkills directory](/skills) has SKILL.md packs that teach Claude to generate factory_boy / FactoryBot definitions.

Best practices for managing test data in BDD projects. Fixtures, factories, isolation, cleanup, data tables, scenario outlines, parallel safety, and patterns for Cucumber, Behave, Reqnroll, and Karate in 2026.

BDD Test Data Management: Best Practices 2026

Test data is the silent killer of BDD adoption. Teams start with clean Gherkin scenarios, automate them with elegant step definitions, and run them on day one with no issues. Then on day 90 the suite is unreliable, scenarios pass locally and fail in CI, and engineers spend more time chasing data inconsistencies than fixing real bugs. The root cause is almost always test data management: shared databases, leaking state, missing isolation, or fragile fixtures.

This guide is the most comprehensive resource for managing test data in BDD projects in 2026. We cover fixture strategies, factory patterns, database isolation, parallel safety, data tables in Gherkin, dynamic data generation, cleanup hooks, and cross-framework patterns for Cucumber-JVM, Behave, Reqnroll, Cucumber.js, and Karate. Every code example is production-tested at scale.

By the end you will have a complete playbook for test data management that scales from 10 scenarios to 10,000 without breaking under the weight of its own state.

Key Takeaways

Isolate every scenario -- the default assumption is shared state will leak.
Use factories, not raw inserts -- factories produce valid objects by default.
Generate unique values -- emails, IDs, names must be unique per scenario.
Clean up via transactions when possible; truncate when not.
Parallel safety requires per-thread schemas or per-process databases.

1. The Test Data Lifecycle

Every BDD scenario has the same lifecycle:

Arrange: create the data the scenario needs.
Act: trigger the behavior under test.
Assert: verify the outcome.
Cleanup: restore the database to a clean state.

Failures usually happen at Arrange or Cleanup. Arrange failures produce flaky scenarios that pass with empty databases and fail when other scenarios polluted the data. Cleanup failures produce slow CI runs where scenario N depends on scenario N-1.

2. Strategy 1: Transactional Rollback

The fastest cleanup strategy: wrap every scenario in a database transaction, then roll back at the end. Tools like ActiveRecord's transactional_fixtures, DatabaseCleaner :transaction strategy, or Spring's @Transactional rollback achieve this.

// Cucumber-JVM with Spring + transactional rollback
@CucumberContextConfiguration
@SpringBootTest
@Transactional
public class TestContextConfig { }

public class Hooks {
    @Autowired private TransactionStatus tx;

    @Before public void begin() { /* started by Spring */ }
    @After public void rollback() { /* rolled back by Spring */ }
}

Pros: fast, no manual cleanup. Cons: doesn't work for browser-driven tests that need committed data visible to a separate browser process.

3. Strategy 2: Truncate Between Scenarios

Works everywhere, including Capybara/Selenium with separate browser processes:

# Behave environment.py
from sqlalchemy import text

def before_scenario(context, scenario):
    with context.engine.connect() as conn:
        for table in ['orders', 'cart_items', 'users']:
            conn.execute(text(f'TRUNCATE TABLE {table} RESTART IDENTITY CASCADE'))
        conn.commit()

Pros: simple, reliable. Cons: slower than transactional rollback (10-50ms per truncate).

4. Strategy 3: Per-Process Database

For parallel execution: each worker gets its own database. Works for parallel_tests in Ruby, behavex in Python, and Cucumber-JVM with custom JDBC URL per thread.

# config/database.yml
test:
  database: myapp_test<%= ENV['TEST_ENV_NUMBER'] %>

Setup before parallel run:

bundle exec rake parallel:create parallel:migrate

5. Factories vs Fixtures

Factories produce objects on demand with sensible defaults; fixtures are pre-loaded data files. In 2026, factories win nearly always.

# FactoryBot in Ruby
FactoryBot.define do
  factory :user do
    sequence(:email) { |n| "user#{n}@example.com" }
    password { 'Sup3rS3cret!' }
    name { 'Test User' }
    role { 'user' }

    trait :admin do
      role { 'admin' }
    end
  end
end

# In a step definition
Given('a user exists') { @user = FactoryBot.create(:user) }
Given('an admin user exists') { @admin = FactoryBot.create(:user, :admin) }

# factory_boy in Python
import factory
from app.models import User

class UserFactory(factory.alchemy.SQLAlchemyModelFactory):
    class Meta:
        model = User
        sqlalchemy_session = SessionLocal()

    email = factory.Sequence(lambda n: f'user{n}@example.com')
    password = 'Sup3rS3cret!'
    name = factory.Faker('name')
    role = 'user'

@given('a user exists')
def step_user_exists(context):
    context.user = UserFactory.create()

6. Gherkin Data Tables

Data tables put the data right in the feature file:

Scenario: Bulk user import
  Given the following users exist:
    | email             | role  | active |
    | alice@example.com | admin | true   |
    | bob@example.com   | user  | true   |
    | carol@example.com | user  | false  |
  When I view the user list
  Then I should see 3 users

This is great for small fixed sets but anti-pattern when data is large. Five rows is fine; fifty rows means the table belongs in a JSON file.

7. Dynamic Data Generation

Use a library like Faker for realistic unique values:

// Reqnroll + Bogus (.NET Faker)
[Binding]
public class UserSteps
{
    private readonly Faker _faker = new Faker();
    private User _user;

    [Given("a user exists")]
    public void GivenUserExists()
    {
        _user = new User { Email = _faker.Internet.Email(), Name = _faker.Name.FullName() };
        _db.Users.Add(_user);
        _db.SaveChanges();
    }
}

8. State Isolation Across Scenarios

Cucumber-JVM with Picocontainer gives you a fresh TestContext per scenario:

public class TestContext {
    private Map<String, Object> data = new HashMap<>();
    public void put(String k, Object v) { data.put(k, v); }
    public <T> T get(String k, Class<T> t) { return t.cast(data.get(k)); }
}

public class UserSteps {
    private final TestContext ctx;
    public UserSteps(TestContext ctx) { this.ctx = ctx; }

    @Given("a user exists with email {string}")
    public void userExists(String email) {
        User u = api.createUser(email);
        ctx.put("user", u);
    }

    @When("the user signs in")
    public void userSignsIn() {
        User u = ctx.get("user", User.class);
        api.signIn(u.getEmail(), u.getPassword());
    }
}

9. Parallel Execution Safety

When running scenarios in parallel, the data layer must be parallel-safe. Strategies:

Approach	Effort	Speed	Safety
Per-process database	Medium	Fast	High
Per-thread schema in single DB	High	Fast	High
Shared DB with truncate + scenario-unique keys	Low	Medium	Medium
Shared DB with no isolation	Low	Fast	Low (flaky)

For most teams, "shared DB with scenario-unique keys" is the right tradeoff: prefix every email with the scenario name, every order number with a UUID, etc. This eliminates cross-scenario interference without the operational cost of per-process DBs.

10. External Service Data

If your scenarios call third-party APIs (Stripe, SendGrid, etc.), use mocks:

// Cypress
beforeEach(() => {
  cy.intercept('POST', 'https://api.stripe.com/v1/charges', { fixture: 'stripe-charge-success.json' });
});

For service virtualization across multiple frameworks, WireMock and MockServer are battle-tested.

11. Cleanup Hooks Cheatsheet

Framework	Hook	When
Cucumber-JVM	@After	After each scenario
Cucumber-JVM	@AfterAll	After full suite
Behave	after_scenario	After each scenario
Reqnroll	[AfterScenario]	After each scenario
Cucumber.js	After	After each scenario
Karate	Background section	Per scenario

12. AI-Assisted Data Setup

The QASkills directory has SKILL.md packs for factory-driven test data in Ruby, Python, Java, and .NET. Combined with AI agents like Claude, you can generate matching factories and step definitions in one prompt.

13. Anti-Patterns

Hardcoded IDs in step definitions: assert user.id == 42 breaks in parallel.
Sharing fixtures across scenarios: makes the order brittle.
Cleanup in @AfterAll only: data accumulates across scenarios.
Database snapshots restored per test class: too slow.

14. Advanced Patterns

Test Data Builders

Builder pattern for complex test data:

public class OrderBuilder {
    private Customer customer = CustomerBuilder.aCustomer().build();
    private List<LineItem> items = new ArrayList<>();
    private LocalDate orderedAt = LocalDate.now();

    public static OrderBuilder anOrder() { return new OrderBuilder(); }
    public OrderBuilder forCustomer(Customer c) { this.customer = c; return this; }
    public OrderBuilder withItem(String name, int qty, BigDecimal price) {
        items.add(new LineItem(name, qty, price));
        return this;
    }
    public OrderBuilder orderedOn(LocalDate date) { this.orderedAt = date; return this; }
    public Order build() { return new Order(customer, items, orderedAt); }
}

In step definitions:

@Given("an order with 3 items")
public void anOrderWith3Items() {
    Order o = anOrder()
        .withItem("Widget", 1, new BigDecimal("19.99"))
        .withItem("Gadget", 2, new BigDecimal("49.99"))
        .withItem("Gizmo", 1, new BigDecimal("9.99"))
        .build();
    orderRepository.save(o);
    context.put("order", o);
}

Snapshot + Restore Patterns

For very expensive setups, snapshot once and restore per scenario:

# Setup once
pg_dump -Fc mydb > seed.dump

# Restore per scenario
pg_restore -c -d mydb seed.dump

This is faster than re-seeding when setup is complex.

Multi-Tenant Data Isolation

For SaaS apps with multi-tenancy, prefix every scenario with a tenant ID:

def before_scenario(context, scenario):
    context.tenant_id = f'test-{uuid.uuid4()}'
    context.api.create_tenant(context.tenant_id)

This ensures cross-scenario interference is impossible.

External Service Test Data

For Stripe, SendGrid, etc., either use their test modes or stub:

beforeEach(() => {
  cy.intercept('POST', 'https://api.stripe.com/v1/charges', {
    statusCode: 200,
    body: { id: 'ch_test_123', status: 'succeeded' },
  });
});

15. Data Privacy and PII

Test data must never contain real PII. Use Faker libraries to generate realistic but synthetic data:

from faker import Faker
fake = Faker()
email = fake.unique.email()  # e.g., john.smith@example.org

Configure Faker to use safe domains:

Faker.seed(0)  # for deterministic data
fake.add_provider(SafeDomainProvider)

16. Cleanup Order Matters

If your schema has foreign keys, truncate order matters:

TRUNCATE order_items, orders, line_items, products, customers RESTART IDENTITY CASCADE;

The CASCADE flag handles foreign keys but slows down cleanup. For performance:

# delete order matters
DELETE FROM order_items;
DELETE FROM orders;
DELETE FROM customers WHERE email LIKE 'test-%';

17. Cross-Framework Patterns

The same patterns apply regardless of framework:

Pattern	Cucumber-JVM	Behave	Reqnroll	Karate
Factories	Custom builders	factory_boy	Bogus	inline JS
Truncation	@Before	before_scenario	[BeforeScenario]	Background
Parallel-safe	Picocontainer	per-process DB	DI scope	per-scenario
Cleanup	@After	after_scenario	[AfterScenario]	Background

18. Frequently Asked Questions

Q: How fast is truncation vs delete? A: TRUNCATE is faster (constant time) for small tables; DELETE is faster for tables with very few rows after WHERE filters. For 100+ row tables, TRUNCATE wins.

Q: Can I use SQLite for BDD tests? A: Possible but not recommended. Production runs Postgres or MySQL; testing on SQLite means subtle bugs (different SQL dialects) escape. Use the same engine as production.

Q: Database snapshots in containers? A: Yes -- Docker compose with volume snapshots can speed up setup. Trade-off: complexity and slow CI.

Q: AI agents for test data? A: Yes -- the QASkills directory has SKILL.md packs that teach Claude to generate factory_boy / FactoryBot definitions.

Q: External APIs in tests? A: Always stub or use sandbox endpoints. Real API calls produce flaky tests.

Conclusion

Test data management is the single highest-leverage area for BDD suite reliability. Adopt factories, isolate per scenario, and pick the right cleanup strategy for your runtime. The result is a suite that scales from 10 scenarios to 10,000 without becoming flaky. See cucumber-java-bdd-best-practices-2026 and behave-python-bdd-complete-tutorial for framework-specific patterns.