Blog

Playwright CLI Complete Guide for Browser Automation and AI Agents

Playwright MCP Complete Guide for Browser Automation with AI Agents

BDD

Comparing Popular BDD Frameworks 2026: Cucumber vs SpecFlow vs Behave vs Gauge vs Karate

PyUnit vs pytest 2026: stdlib unittest or the third-party favorite

All Articles

Page 1 of 27

Add MCP Conformance Tests to GitHub Actions with Failure Baselines

Add pinned MCP conformance tests to GitHub Actions, preserve raw evidence, and govern expected failures without hiding regressions or stale exceptions.

2026-04-13

AI Agent Evaluation Guide for Tools, Trajectories, and Task Success

Evaluate AI agents across task outcomes, tool selection and arguments, trajectories, environment state, repeated trials, safety, latency, and cost.

AI Testing

2026-02-17

AI Test Automation Tools and Workflows for QA Teams in 2026

A rigorous guide to selecting and governing AI test automation tools, testing AI systems, review workflows, CI gates, security, drift, metrics, and adoption.

Strategy

AI4Testing vs Testing AI: Two Different QA Strategies Explained

Compare AI4Testing with Testing AI through scope, ownership, evidence, examples, risks, metrics, and a practical decision model for QA teams in 2026.

Contextual Precision vs Recall vs Relevancy for RAG Testing

Compare RAG context precision, recall, and relevancy without conflating ranking, evidence coverage, or focus, using reproducible fixtures and CI checks.

Debug Playwright Tests with --debug=cli and Agent Trace Commands

Pause a Playwright test for agent attachment, inspect it with playwright-cli, analyze trace.zip from the terminal, and record separate agent session traces.

DeepEval 3 to 4 Migration Guide for Traces and Multi-Turn Goldens

Migrate DeepEval 3 suites to DeepEval 4 with pinned environments, trace parity, multi-turn goldens, shadow CI, failure triage, and rollback controls.

2026-07-05

DeepEval 4 Tutorial for Pytest-Style LLM, RAG, and Agent Testing

Build DeepEval 4.1 tests for LLMs, RAG, agents, conversations, synthetic datasets, custom metrics, failure diagnosis, and CI quality gates.

DeepEval ConversationSimulator Tutorial with Synthetic Users

Build DeepEval ConversationSimulator tests with conversational goldens, stateful callbacks, controlled stopping, evaluation metrics, CI, and failure analysis.

2026-07-13

DeepEval TaskCompletionMetric: Trace Setup and Failure Analysis

Implement DeepEval TaskCompletionMetric with complete agent traces, calibrated judges, CI gates, outcome evidence, and systematic failure diagnosis.

Comparison

Deterministic Graders vs LLM Judges vs Human Review

Choose deterministic graders, calibrated LLM judges, or human review using task risk, objective evidence, agreement checks, cost, and escalation rules.

Evaluate Codex vs Claude Coding Agents with Promptfoo

Build a fair Promptfoo coding-agent evaluation for Codex and Claude using controlled tasks, sandboxes, graders, repeated trials, and review.

Generate Synthetic RAG Testsets with Ragas and Your Documents

Generate, validate, and govern synthetic RAG testsets with current Ragas concepts while controlling leakage, sampling bias, privacy, and held-out use.

High Answer Relevance but Low Faithfulness: Diagnose Wrong RAG Answers

Diagnose fluent, on-topic RAG answers that are unsupported by retrieved evidence using atomic claim tracing, controlled context tests, and release gates.

How to Build an LLM Eval Harness That Matches Production Behavior

Build an LLM eval harness that exercises the production application path, isolates state, records versions and traces, repeats trials, and gates releases.

How to Install Playwright CLI Skills in Codex and Claude Code

Install Playwright CLI skills for Codex and Claude Code, verify agent discovery, run a browser smoke test, and fix common skill setup failures.

How to Install the DeepEval Skill in Codex, Claude Code, and Cursor

Install and verify the official DeepEval skill in Codex, Claude Code, and Cursor, then govern permissions, eval loops, updates, CI, and rollback.

How to Tell Whether a RAG Failure Comes from Retrieval or Generation

Localize RAG defects with controlled context substitutions, claim evidence, retrieval labels, and release checks that separate retrieval from generation.

Playbook

2026-02-19

How to Test AI-Generated Code: A Practical SDET Review Playbook

Review and test AI-generated code with a risk-based SDET workflow covering diff scope, independent oracles, security, Playwright checks, CI, and merge gates.

Install Promptfoo Agent Skills in Codex and Claude Code

Install Promptfoo agent skills in Codex and Claude Code, verify routing and config output, and govern permissions, upgrades, and repository policy.

Certification

ISTQB CT-AI v2.0 Guide for QA Engineers: What Changed in 2026

Understand the ISTQB CT-AI v2.0 scope, syllabus changes, exam facts, practical exercises, migration choices, and a focused study plan for QA engineers.

2026-03-17

LLM Testing Complete Guide: Evals, Agents, RAG, and Quality Gates

Build reliable LLM tests for prompts, agents, RAG, multi-turn workflows, security, monitoring, CI quality gates, cost, and eval-platform migration.

2026-06-14

MCP Server Testing Complete Guide for Protocol, Tools, and Security

Test MCP servers across lifecycle, transports, contracts, conformance, Inspector, authentication, resilience, observability, and security in 2026.

2026-07-06

npx playwright init-agents Setup Guide for Agentic Test Loops

Set up Playwright test-agent definitions for VS Code, Claude Code, Codex, or OpenCode, verify the generated files, and diagnose setup failures.

OpenAI Evals Platform Shutdown: Migration Checklist for November 2026

Migrate OpenAI Evals before the November 2026 shutdown with export, Promptfoo, code-first parity, CI, grader validation, and rollback checklists.

Playwright 1.61 WebAuthn Passkey Testing with Virtual Authenticators

Test passkey registration and sign-in with the cross-browser Credentials virtual authenticator added in Playwright 1.61, including reuse and failure diagnosis.

2026-04-01

Playwright BrowserContext Guide for Isolation and Parallel Sessions

Use Playwright BrowserContext for clean test isolation, independent multi-user sessions, reusable auth state, context-wide controls, and safe parallel execution.

Playwright CLI Accessibility Snapshots and Element References Explained

Understand Playwright CLI accessibility snapshots, use short-lived element refs safely, scope agent context, and troubleshoot stale or missing references.

Playwright CLI Complete Guide for Browser Automation and AI Agents

Use Playwright CLI for agent-driven browser automation, snapshots, sessions, debugging, traces, video, secure CI workflows, and MCP decisions in 2026.

Playwright Generator Agent Guide for Maintainable Test Code

Turn reviewed Markdown plans into maintainable Playwright tests with the Generator agent, live verification, fixture reuse, and disciplined code review.

Troubleshooting

2026-07-07

Playwright Healer Agent Guide for Repairing Failed Browser Tests

Use the Playwright Healer agent to replay a named failure, inspect current UI behavior, review a minimal patch, and reject repairs that hide regressions.

Playwright localStorage and sessionStorage API Guide for Version 1.61

Use Playwright 1.61 page.localStorage and page.sessionStorage to inspect, seed, clear, and diagnose origin-scoped browser state without page.evaluate boilerplate.

2026-04-01

Playwright Locators Best Practices: Roles, Strictness, and Stability

Choose stable Playwright locators with roles, labels, scoped filters, strictness, and web-first assertions, then diagnose ambiguity without brittle shortcuts.

2026-05-18

Playwright MCP Complete Guide for Browser Automation with AI Agents

Configure and use Playwright MCP for AI browser automation, testing, profiles, security, HTTP, Docker, CI, and reliable agent workflows in 2026.

Playwright MCP Persistent, Isolated, and Browser Extension Profiles

Choose Playwright MCP persistent, isolated, or browser-extension state for QA, including storage-state setup, parallel sessions, and profile risks.

Security

Playwright MCP Security Best Practices for Files, Origins, and Secrets

Harden Playwright MCP file access, browser origins, profiles, secrets, artifacts, transports, sessions, and authorization with official security guidance.

2026-06-03

Playwright MCP Server Configuration Reference for QA Teams

Configure Playwright MCP for QA with documented CLI flags, environment variables, JSON schema, capabilities, browsers, timeouts, output, and network controls.

Playwright MCP Testing Capability: Assertions and Test Generation

Enable Playwright MCP testing tools, verify elements, text, lists, and values, generate locators, and convert browser exploration into reviewable tests.

Playwright Planner Agent Guide for High-Coverage Markdown Test Plans

Use the Playwright Planner agent to explore bounded user flows, design risk-based scenarios, and produce precise Markdown plans ready for human review.

2026-07-03

Playwright Test Agents Complete Guide: Planner, Generator, and Healer

Use Playwright test agents safely from setup through planning, generation, trace-driven healing, review, and CI with current Playwright 1.61 guidance.

2026-02-13

Playwright Testing Complete Guide for Reliable E2E Automation in 2026

Build reliable Playwright E2E automation with current setup, locators, fixtures, isolation, auth, API testing, mocking, debugging, CI, and Playwright 1.61 guidance.

2026-05-21

Promptfoo Complete Guide for LLM Evals, RAG, and Red Teaming

Use Promptfoo to design LLM evals, test RAG and coding agents, run red teams, enforce CI gates, preserve evidence, and govern reproducible releases.

2026-06-04

RAG QA Testing Guide for Retrieval, Generation, and Citation Quality

Build a rigorous RAG testing strategy for retrieval, context, answers, citations, security, cost, latency, regression data, CI, and production monitoring.

Run Parallel Playwright CLI Sessions with PLAYWRIGHT_CLI_SESSION

Run coding agents in isolated Playwright CLI browser sessions, monitor them in the dashboard, attach to existing browsers, and clean up state safely.

Run the Official MCP Conformance Suite Against Your Server

Run the official MCP conformance suite against a live server with a pinned runner, deterministic fixtures, scoped results, and release-ready evidence.

Governance

2026-07-02

Self-Healing Test Automation Governance for Reliable QA Suites

Govern self-healing test automation with eligibility rules, human review, audit evidence, stop conditions, safe examples, rollout phases, and reliability metrics.

Test and Red-Team an MCP Server with Promptfoo's MCP Provider

Configure Promptfoo MCP provider tests for local and remote servers, add authorization and threat cases, and gate safe results in CI.

2026-07-10

Test MCP Tool Schemas, Defaults, Invalid Inputs, and Error Types

Test MCP tool schemas, omitted defaults, invalid inputs, protocol and execution errors, structured output, and product semantics under the current spec.

Use MCP Inspector CLI to Automate tools/list and tools/call Tests

Automate MCP tools/list and tools/call checks with the pinned Inspector CLI, typed arguments, JSON assertions, negative cases, and CI-safe evidence.

AI Testing