Skip to main content
Back to Blog
Guide
2026-02-22

Visual Regression Testing -- Tools, Setup, and AI Automation

Complete guide to visual regression testing. Covers Playwright visual comparisons, Percy, Applitools, BackstopJS, baseline management, and AI agent automation.

Visual regression testing catches the bugs that functional tests miss -- the subtle CSS regressions, layout shifts, font rendering changes, and color mismatches that slip through unit and integration test suites undetected. When your application looks broken to users, it does not matter that every API endpoint returns a 200. This guide covers everything you need to implement visual regression testing effectively: from Playwright's built-in screenshot comparison to cloud-based tools like Percy and Applitools, baseline management strategies, flaky test mitigation, CI/CD integration, and automating the entire process with AI agents.

Key Takeaways

  • Visual regression testing compares screenshots of your application against approved baselines to detect unintended UI changes -- catching bugs that functional tests cannot
  • Playwright's toHaveScreenshot() API provides built-in visual comparison with configurable thresholds, making it the easiest way to get started without third-party dependencies
  • Percy, Applitools, and BackstopJS each serve different needs: Percy for cloud-based team workflows, Applitools for AI-powered visual AI, and BackstopJS for free self-hosted testing
  • Flaky visual tests are caused by anti-aliasing differences, font rendering, animations, and dynamic content -- all solvable with masking, animation disabling, and consistent environments
  • CI/CD integration with approval gates ensures no visual regression reaches production without human review
  • AI agents equipped with QA skills from QASkills.sh can generate and maintain visual regression test suites automatically

What Is Visual Regression Testing?

Visual regression testing is the practice of automatically comparing screenshots of your application's UI before and after code changes to detect unintended visual differences. Unlike functional testing, which verifies that buttons click and forms submit, visual regression testing verifies that your application looks correct to users.

Functional tests are blind to an entire category of bugs. A button can be functionally correct -- it fires the right event, sends the right API call, and updates the right state -- while being completely invisible because a CSS change set its color to match the background. A layout can pass every accessibility assertion while overlapping elements render content unreadable. A responsive design can satisfy every viewport media query while a flexbox change pushes the call-to-action button below the fold.

Visual regression testing catches these problems:

  • CSS regressions -- A refactored stylesheet unintentionally changes padding, margins, or font sizes across multiple pages
  • Layout shifts -- A new component pushes existing elements out of their expected positions
  • Font rendering changes -- A web font fails to load and the fallback system font changes the entire page layout
  • Color mismatches -- A theme variable update affects components it should not have touched
  • Responsive breakages -- A change that looks perfect on desktop breaks the mobile layout
  • Z-index conflicts -- Modals, tooltips, or dropdowns render behind other elements
  • Dark mode inconsistencies -- A component looks correct in light mode but has unreadable text in dark mode

The key insight is that visual testing treats your UI as what it actually is -- a visual artifact rendered in a browser. Instead of testing individual CSS properties or DOM structures, you test the final rendered output that users actually see.


How Visual Regression Testing Works

The visual regression testing workflow follows a straightforward capture-compare-review cycle that repeats on every code change.

Step 1: Capture baseline screenshots. On the first run, the tool captures screenshots of each page, component, or viewport you want to test. These screenshots become the baseline -- the approved visual state of your application. Baselines are typically captured from the main branch or a known-good release.

Step 2: Run tests after changes. When a developer pushes a code change, the test suite captures new screenshots of the same pages and components under identical conditions (same viewport size, same browser, same test data).

Step 3: Compare screenshots. The tool compares each new screenshot against its baseline using one of several comparison algorithms:

  • Pixel-by-pixel diff -- The simplest approach. Compares each pixel in the new screenshot against the corresponding pixel in the baseline. Any difference is flagged. This is fast but sensitive to sub-pixel rendering differences, anti-aliasing, and font smoothing. Playwright's toHaveScreenshot() uses this approach with a configurable threshold.

  • Perceptual diff -- Uses algorithms that model human visual perception. Small color shifts that humans cannot see are ignored, while larger changes that humans would notice are flagged. Tools like pdiff and Rembrandt implement this approach. This reduces false positives from rendering engine differences.

  • AI-based comparison -- Uses machine learning models trained on visual differences to distinguish between meaningful changes and rendering noise. Applitools' Visual AI is the leading implementation. It can recognize that a layout is structurally the same even when pixel values differ due to dynamic content, animations, or anti-aliasing.

Step 4: Threshold-based pass/fail. If the difference between the new screenshot and the baseline exceeds a configured threshold, the test fails. Thresholds can be set as a maximum number of different pixels (maxDiffPixels), a maximum percentage of different pixels (maxDiffPixelRatio), or a perceptual difference score.

Step 5: Human review for intentional changes. When tests fail, a developer reviews the diff images. If the change is intentional (a redesigned button, a new layout), they update the baseline to reflect the new approved state. If the change is unintentional, they fix the regression. This human-in-the-loop step is essential because automated tools cannot distinguish between a bug and a deliberate design change.


Visual Testing with Playwright

Playwright provides built-in visual comparison testing through the toHaveScreenshot() API. This is the fastest way to add visual regression testing to an existing Playwright test suite because it requires no third-party services, no additional dependencies, and no external accounts.

Configuration

Configure snapshot settings in your playwright.config.ts:

import { defineConfig } from '@playwright/test';

export default defineConfig({
  testDir: './tests',
  expect: {
    toHaveScreenshot: {
      // Maximum allowed pixel difference ratio (0 to 1)
      maxDiffPixelRatio: 0.01,
      // Or use absolute pixel count
      // maxDiffPixels: 100,
      // Animation tolerance
      animations: 'disabled',
      // CSS to apply to every screenshot (hide dynamic content)
      stylePath: './tests/screenshot.css',
    },
  },
  use: {
    // Consistent viewport for reproducible screenshots
    viewport: { width: 1280, height: 720 },
    // Consistent locale and timezone
    locale: 'en-US',
    timezoneId: 'America/New_York',
  },
  projects: [
    {
      name: 'chromium',
      use: { browserName: 'chromium' },
    },
  ],
});

Full-Page Screenshot Test

import { test, expect } from '@playwright/test';

test('homepage visual regression', async ({ page }) => {
  await page.goto('/');
  // Wait for all images and fonts to load
  await page.waitForLoadState('networkidle');
  // Take a full-page screenshot and compare against baseline
  await expect(page).toHaveScreenshot('homepage.png', {
    fullPage: true,
    maxDiffPixelRatio: 0.01,
  });
});

test('homepage hero section', async ({ page }) => {
  await page.goto('/');
  await page.waitForLoadState('networkidle');
  // Screenshot a specific section
  const hero = page.locator('[data-testid="hero-section"]');
  await expect(hero).toHaveScreenshot('hero-section.png');
});

Component-Level Screenshot Test

import { test, expect } from '@playwright/test';

test.describe('Button component visual tests', () => {
  test('primary button states', async ({ page }) => {
    await page.goto('/components/button');

    const button = page.getByRole('button', { name: 'Submit' });

    // Default state
    await expect(button).toHaveScreenshot('button-default.png');

    // Hover state
    await button.hover();
    await expect(button).toHaveScreenshot('button-hover.png');

    // Focus state
    await button.focus();
    await expect(button).toHaveScreenshot('button-focus.png');
  });

  test('button variants', async ({ page }) => {
    await page.goto('/components/button');

    // Screenshot each variant
    const variants = ['primary', 'secondary', 'danger', 'ghost'];
    for (const variant of variants) {
      const button = page.locator(`[data-variant="${variant}"]`);
      await expect(button).toHaveScreenshot(`button-${variant}.png`);
    }
  });
});

Updating Baselines

When you intentionally change the UI, update baselines by running:

npx playwright test --update-snapshots

This regenerates all baseline screenshots. Review the changes in your git diff before committing to ensure only intentional changes are captured. For targeted updates, run a specific test file:

npx playwright test tests/visual/homepage.spec.ts --update-snapshots

Threshold Configuration

Playwright gives you granular control over comparison sensitivity:

// Strict -- almost no difference allowed
await expect(page).toHaveScreenshot('critical-ui.png', {
  maxDiffPixels: 0,
});

// Moderate -- small rendering differences allowed
await expect(page).toHaveScreenshot('content-page.png', {
  maxDiffPixelRatio: 0.01, // 1% of pixels can differ
});

// Lenient -- for pages with minor dynamic content
await expect(page).toHaveScreenshot('dashboard.png', {
  maxDiffPixelRatio: 0.05, // 5% tolerance
  threshold: 0.3, // Per-pixel color difference threshold (0-1)
});

The threshold option controls how different a single pixel must be to count as "changed" (on a scale of 0 to 1), while maxDiffPixels and maxDiffPixelRatio control how many changed pixels are allowed before the test fails.


Tool Comparison: Percy vs Applitools vs BackstopJS

Choosing the right visual testing tool depends on your team size, budget, and workflow. Here is a detailed comparison of the three most popular options alongside Playwright's built-in capabilities.

FeaturePercyApplitoolsBackstopJSPlaywright Built-in
PricingFree tier (5K snapshots/mo), paid plans from $399/moFree tier (100 steps/mo), enterprise pricingFree and open-sourceFree (built-in)
AI-powered diffSmart diff (layout detection)Visual AI (industry-leading)Pixel diff onlyPixel diff with threshold
CI integrationGitHub, GitLab, Bitbucket, CircleCIAll major CI providersAny CI with Node.jsNative Playwright CI
Framework supportPlaywright, Cypress, Selenium, StorybookPlaywright, Cypress, Selenium, Storybook, AppiumPuppeteer, Playwright (via config)Playwright only
Cloud/self-hostedCloud onlyCloud or on-premiseSelf-hosted onlySelf-hosted (local)
Responsive testingAutomatic multi-widthAutomatic multi-viewportConfigurable viewportsManual per-project config
Approval workflowBrowser-based review dashboardBrowser-based with batch approveCLI and HTML reportGit-based (update snapshots)
ParallelismCloud-managedCloud-managedLocal parallelPlaywright sharding

Percy (BrowserStack)

Percy is a cloud-based visual testing platform now owned by BrowserStack. Its strongest feature is the team collaboration workflow -- when a visual test fails, Percy generates a side-by-side diff in a web dashboard where team members can approve or reject changes. Percy integrates directly with pull requests, adding status checks that block merges until visual changes are reviewed.

Percy works by receiving screenshots from your test suite (via SDKs for Playwright, Cypress, or Selenium) and rendering them in its cloud across multiple browsers and viewport widths. This means you get cross-browser visual coverage without running tests on multiple browsers locally. The trade-off is that Percy requires sending your application's HTML and assets to its cloud for rendering, which may not work for applications behind firewalls or with strict data policies.

Applitools Eyes

Applitools differentiates itself with its Visual AI engine, which uses machine learning to compare screenshots the way a human would. Instead of comparing raw pixels, it understands layout structure, text content, and visual elements. This means it can detect that a button moved 2 pixels to the right (meaningful) while ignoring that anti-aliasing rendered slightly differently on this CI run (noise).

The AI-based approach dramatically reduces false positives. Teams using Applitools report 95%+ fewer false positives compared to pixel-based tools. Applitools also offers an Ultrafast Grid that renders your page across dozens of browser and viewport combinations in seconds, not minutes. The downside is cost -- enterprise pricing can be significant, and the free tier is limited to 100 checkpoint steps per month.

BackstopJS

BackstopJS is a free, open-source visual regression testing tool that runs entirely on your own infrastructure. It uses Puppeteer or Playwright to capture screenshots and generates an HTML report with side-by-side diffs. Configuration is done through a JSON file where you define scenarios (URLs to test) and viewports.

BackstopJS is ideal for teams that want visual regression testing without external dependencies or cloud services. It works well in Docker-based CI environments where you control the rendering environment completely. The trade-off is that you lose the AI-based diffing, cloud rendering, and team approval workflows that Percy and Applitools provide. You also need to manage baseline storage and CI integration yourself.


Baseline Management Strategies

How you store and manage baseline screenshots is one of the most critical decisions in visual regression testing. Poor baseline management leads to merge conflicts, bloated repositories, and inconsistent test results across branches.

Git LFS for Screenshots

Baseline screenshots are binary files that change frequently. Storing them directly in git bloats the repository over time because git stores every version of every file. Git Large File Storage (LFS) solves this by storing binary files on a separate server while keeping lightweight pointer files in the repository.

# Install Git LFS
git lfs install

# Track screenshot files
git lfs track "**/*.png"
git lfs track "tests/**/__screenshots__/**"

# Commit the .gitattributes
git add .gitattributes
git commit -m "Track screenshots with Git LFS"

.gitignore Patterns

If you choose not to commit baselines to git, use .gitignore patterns and store baselines as CI artifacts or in cloud storage:

# .gitignore
tests/**/__screenshots__/
tests/**/test-results/
*.diff.png
*.actual.png

Branch-Based Baselines

One of the trickiest aspects of visual testing is handling baselines across branches. When two developers change the UI simultaneously on separate branches, their baselines diverge from main and from each other.

Strategy 1: Main branch as source of truth. Baselines are always generated from the main branch. Feature branches compare against main's baselines. When a feature branch intentionally changes the UI, the developer updates baselines as part of the branch and they are merged along with the code changes.

Strategy 2: Per-branch baselines (Percy/Applitools). Cloud-based tools like Percy automatically manage per-branch baselines. Each branch gets its own baseline derived from the point it branched off main. When the branch merges, the updated baselines are automatically promoted to main.

Handling Dynamic Content

Dynamic content -- timestamps, user avatars, ads, random data -- causes visual tests to fail on every run even when nothing has changed. There are several strategies to handle this:

  • CSS masking -- Apply a CSS class that hides dynamic elements during screenshots. Playwright supports a stylePath option for this.
  • Mock data -- Use consistent test data with seeded fixtures so that content is identical across runs.
  • Region masking -- Most tools support masking rectangular regions of the screenshot. Percy and Applitools support this natively. In Playwright, you can use the mask option.
  • Clock freezing -- Use Playwright's page.clock API to freeze time so that timestamps and relative dates are consistent.
import { test, expect } from '@playwright/test';

test('dashboard with masked dynamic content', async ({ page }) => {
  // Freeze time for consistent timestamps
  await page.clock.setFixedTime(new Date('2026-01-15T10:00:00Z'));

  await page.goto('/dashboard');
  await page.waitForLoadState('networkidle');

  // Mask dynamic elements during screenshot
  await expect(page).toHaveScreenshot('dashboard.png', {
    mask: [
      page.locator('.user-avatar'),
      page.locator('.ad-banner'),
      page.locator('.live-feed'),
    ],
  });
});

Handling Flaky Visual Tests

Flaky visual tests are the number one reason teams abandon visual regression testing. A test that fails randomly without any code change destroys trust in the test suite. Understanding the root causes of visual test flakiness and applying targeted mitigations keeps your suite reliable. For a comprehensive treatment of all types of test flakiness, see our guide to fixing flaky tests.

Common Causes of Flaky Visual Tests

Anti-aliasing differences. Different operating systems, GPU drivers, and even different runs of the same browser can produce slightly different anti-aliasing for text and vector graphics. A pixel-level comparison sees these as failures even though a human cannot tell the difference.

Font rendering. Fonts render differently across macOS, Linux, and Windows. Even on the same OS, different font hinting settings and subpixel rendering configurations produce different results. If developers run tests on macOS and CI runs on Linux, baselines will never match.

Animation timing. If a screenshot is captured mid-animation, the captured frame varies between runs. A loading spinner, a fade-in transition, or a CSS animation can produce different screenshots depending on exact timing.

Dynamic content. Timestamps, relative dates ("3 minutes ago"), random avatars, ad placements, and A/B test variations all produce different screenshots on every run.

Viewport inconsistencies. Browser chrome, scrollbar styles, and system-level scaling (Retina vs non-Retina) affect the rendered output. If the viewport is not set explicitly, the default varies across environments.

Mitigation Strategies

Disable animations in test configuration. Playwright supports this natively:

// playwright.config.ts
export default defineConfig({
  expect: {
    toHaveScreenshot: {
      animations: 'disabled',
    },
  },
});

Use Docker for consistent rendering. Run visual tests inside a Docker container to ensure identical OS, fonts, and rendering environment across all developers and CI:

# docker-compose.visual-tests.yml
services:
  visual-tests:
    image: mcr.microsoft.com/playwright:v1.50.0-noble
    working_dir: /app
    volumes:
      - .:/app
    command: npx playwright test --project=chromium tests/visual/
    environment:
      - CI=true

Mask dynamic regions. Use the mask option to hide elements that change between runs (avatars, ads, live feeds, timestamps).

Set tolerance thresholds. Allow a small percentage of pixel differences to accommodate sub-pixel rendering variations:

await expect(page).toHaveScreenshot('page.png', {
  maxDiffPixelRatio: 0.01,
  threshold: 0.2,
});

Wait for network idle and fonts. Ensure all resources are loaded before capturing:

await page.goto('/');
await page.waitForLoadState('networkidle');
// Wait for web fonts to finish loading
await page.evaluate(() => document.fonts.ready);

Use a single browser in CI. Cross-browser visual tests multiply flakiness. Run visual regression tests in Chromium only (the most consistent renderer) and use functional tests for cross-browser coverage.


CI/CD Integration

Visual regression tests are most valuable when they run automatically on every pull request, blocking merges when unreviewed visual changes are detected. Here is a GitHub Actions workflow that runs Playwright visual tests, uploads diff artifacts, and posts results as PR comments.

# .github/workflows/visual-tests.yml
name: Visual Regression Tests

on:
  pull_request:
    branches: [main]

jobs:
  visual-tests:
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.50.0-noble
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Run visual regression tests
        run: npx playwright test tests/visual/ --project=chromium
        continue-on-error: true
        id: visual-tests

      - name: Upload test results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: visual-test-results
          path: |
            test-results/
            playwright-report/
          retention-days: 30

      - name: Upload diff screenshots
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: visual-diffs
          path: test-results/**/*-diff.png
          retention-days: 30

      - name: Comment on PR with results
        if: always() && github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const outcome = '${{ steps.visual-tests.outcome }}';
            const body = outcome === 'success'
              ? '## Visual Tests Passed\nNo visual regressions detected.'
              : '## Visual Tests Failed\nVisual regressions detected. Download the `visual-diffs` artifact to review changes.\n\nIf these changes are intentional, update baselines with:\n```bash\nnpx playwright test --update-snapshots\n```';

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: body
            });

      - name: Fail if visual tests failed
        if: steps.visual-tests.outcome == 'failure'
        run: exit 1

This workflow runs visual tests inside a Playwright Docker container for consistent rendering, uploads all test results and diff screenshots as downloadable artifacts, posts a comment on the PR with results and instructions for updating baselines, and fails the check so branch protection rules prevent merging until the visual regression is addressed.

For a complete guide to building CI/CD pipelines with testing stages, see CI/CD Testing Pipeline with GitHub Actions.


Visual Testing at Scale

As your application grows, visual regression testing must scale with it. Testing every page at every viewport on every PR becomes prohibitively slow without the right strategies.

Component-Level Testing with Storybook

Instead of testing entire pages, test individual components in isolation using Storybook. Each Storybook story becomes a visual test case, which means you get comprehensive visual coverage without needing to navigate through the full application.

Chromatic (built by the Storybook team) is a cloud service specifically designed for Storybook visual testing. It captures a screenshot of every story on every PR, detects visual changes, and provides a review workflow. This is the most efficient way to do component-level visual testing because Storybook already renders each component in isolation with controlled props.

# Install Chromatic
npm install --save-dev chromatic

# Run visual tests against your Storybook
npx chromatic --project-token=your_token

Responsive Matrix Testing

Test critical pages at multiple viewport widths to catch responsive layout regressions:

import { test, expect } from '@playwright/test';

const viewports = [
  { name: 'mobile', width: 375, height: 812 },
  { name: 'tablet', width: 768, height: 1024 },
  { name: 'desktop', width: 1280, height: 720 },
  { name: 'wide', width: 1920, height: 1080 },
];

for (const vp of viewports) {
  test(`homepage visual regression - ${vp.name}`, async ({ browser }) => {
    const context = await browser.newContext({
      viewport: { width: vp.width, height: vp.height },
    });
    const page = await context.newPage();
    await page.goto('/');
    await page.waitForLoadState('networkidle');
    await expect(page).toHaveScreenshot(`homepage-${vp.name}.png`, {
      fullPage: true,
    });
    await context.close();
  });
}

Parallel Execution

Use Playwright's sharding to distribute visual tests across multiple CI workers:

# Run visual tests in 4 parallel shards
jobs:
  visual-tests:
    strategy:
      matrix:
        shard: [1/4, 2/4, 3/4, 4/4]
    steps:
      - run: npx playwright test tests/visual/ --shard ${{ matrix.shard }}

Selective Testing

Instead of running all visual tests on every PR, only run visual tests for components that changed. Use file path filters in your CI configuration:

on:
  pull_request:
    paths:
      - 'src/components/**'
      - 'src/styles/**'
      - 'src/app/**/*.tsx'
      - 'tests/visual/**'

This ensures visual tests only run when UI-related files change, saving CI minutes on backend-only changes.


Automating Visual Tests with AI Agents

Writing and maintaining visual regression tests is repetitive work that AI coding agents handle exceptionally well. An AI agent can analyze your application's pages, generate screenshot tests for critical UI components, configure appropriate thresholds, and update baselines when you confirm that changes are intentional.

QASkills.sh provides specialized skills that teach AI agents visual testing best practices. Install the visual regression skill to give your agent expert knowledge about screenshot comparison, baseline management, and flaky test prevention:

npx @qaskills/cli add visual-regression

For generating baseline screenshots across multiple pages and viewports automatically:

npx @qaskills/cli add screenshot-baseline-generator

Related skills that complement visual testing workflows:

# Test responsive layouts across breakpoints
npx @qaskills/cli add responsive-layout-breaker

# Find visual bugs specific to dark mode
npx @qaskills/cli add dark-mode-bug-finder

# Full Playwright E2E testing expertise
npx @qaskills/cli add playwright-e2e

With these skills installed, your AI agent can generate a complete visual testing suite -- including Playwright configuration, screenshot tests for every page, responsive viewport matrix tests, dynamic content masking, and CI/CD workflow files -- from a single prompt.

Browse all available QA skills at qaskills.sh/skills or read the getting started guide to install your first skill in under 60 seconds.

For deeper dives into related topics, explore:


Frequently Asked Questions

When should I add visual regression testing to my project?

Add visual regression testing once your application has a stable UI that multiple developers work on. If you are still in the rapid prototyping phase where the design changes daily, visual tests will create more noise than value because baselines need constant updating. The sweet spot is when your design system is established, your component library is relatively stable, and you want to prevent unintended visual regressions during feature development and refactoring. For most teams, this is after the first production release.

How much does visual regression testing cost?

The cost ranges from free to thousands of dollars per month depending on your approach. Playwright's built-in screenshots and BackstopJS are completely free and self-hosted -- your only cost is CI compute time. Percy offers a free tier with 5,000 snapshots per month, with paid plans starting at $399/month for larger teams. Applitools has a limited free tier (100 checkpoints/month) with enterprise pricing for production use. Chromatic (for Storybook) offers a free tier with 5,000 snapshots/month. For most small-to-medium teams, Playwright's built-in visual comparison running inside a Docker container on GitHub Actions is sufficient and costs nothing beyond standard CI minutes.

How do I reduce false positives in visual tests?

False positives are the biggest pain point in visual testing. Start by running tests inside Docker containers to eliminate OS-level rendering differences. Disable all CSS animations and transitions during testing. Mask dynamic content like timestamps, avatars, and ad banners. Set a small but non-zero tolerance threshold (maxDiffPixelRatio: 0.01) to absorb sub-pixel anti-aliasing differences. Use a single browser (Chromium) for visual tests instead of cross-browser. Freeze time with page.clock to eliminate date-related changes. If you are still getting false positives, consider switching from pixel-based comparison to AI-based comparison (Applitools) which understands layout structure and ignores rendering noise.

Can I use visual regression testing for mobile apps?

Yes, but the approach differs. For mobile web applications, Playwright supports mobile viewport emulation with device-specific user agents, so you can test responsive layouts at mobile dimensions. For native mobile apps (iOS/Android), you need a different toolchain. Applitools supports native mobile testing through its Appium SDK. Percy does not support native mobile directly but can test mobile web through its responsive widths feature. For React Native apps, you can use Storybook with Chromatic to test components visually in a web environment. The key challenge with mobile visual testing is the vast matrix of screen sizes, pixel densities, and OS versions -- focus your visual tests on a small set of representative devices rather than trying to cover everything.

How does visual regression testing fit with other types of testing?

Visual regression testing is a complement to, not a replacement for, other testing types. It sits alongside your existing unit tests, integration tests, and E2E tests in the testing pyramid. Unit tests verify business logic. Integration tests verify that components work together. E2E tests verify user flows end-to-end. Visual regression tests verify that the rendered UI matches the approved design. The typical execution order in CI is lint, then unit tests, then integration tests, then E2E tests, then visual tests. Visual tests run last because they are the slowest (screenshot capture and comparison) and the most likely to need human review. Some teams run visual tests in a separate, parallel CI job to avoid blocking the main pipeline while still catching visual regressions before merge.

Visual Regression Testing -- Tools, Setup, and AI Automation | QASkills.sh