Skip to main content
Back to Blog
AI Testing
2026-05-08

Codex CLI for QA Engineers: Complete Guide 2026

Complete guide to OpenAI Codex CLI for QA engineers. Setup, agent workflows, model selection, MCP integration, custom prompts, sandbox modes, and test generation patterns for QA teams in 2026.

Codex CLI for QA Engineers: Complete Guide 2026

OpenAI launched Codex CLI in 2024 as a terminal-native AI coding agent built on the GPT-4o / o1 model family, then deprecated o1 and migrated to GPT-4.1 and GPT-5 throughout 2025. The 2026 version of Codex CLI is a polished, production-grade alternative to Claude Code, with strong reasoning, multi-step planning, sandboxed execution, and a growing MCP server ecosystem. For QA engineers who prefer the terminal and have OpenAI API budget, Codex CLI is a credible choice alongside Claude Code.

This guide covers Codex CLI specifically for QA: installation, authentication, sandbox modes, model selection, custom prompts, MCP integration, integration with Playwright/pytest/Cucumber, and the workflows that produce reliable test generation. Every example is current with Codex CLI 0.40+ running on GPT-5.

By the end you will have Codex CLI configured for QA work, integrated with QASkills SKILL.md, and producing tests that match your team's conventions.

Key Takeaways

  • Codex CLI is OpenAI's terminal AI agent.
  • Three sandbox modes: read-only, suggest, auto.
  • Brings GPT-5's strong reasoning for complex test generation.
  • MCP server support for browser and database tools.
  • Best for OpenAI-aligned teams with API budget.

1. Installation

npm install -g @openai/codex
codex --version

Or via Homebrew:

brew install openai/codex/codex

2. Authentication

export OPENAI_API_KEY="sk-..."
codex auth login

Or interactive login:

codex login --browser

3. First Run

cd ~/my-test-project
codex

This opens a terminal chat session in the current directory.

4. Sandbox Modes

Codex CLI has three modes:

ModeWhat it can doBest for
read-onlyRead files, no edits, no shellExploration
suggestPropose edits and shell commands; you approveDefault for QA
autoEdit files and run shell commands without approvalCI, headless

Set in .codex/config.toml:

[default]
mode = "suggest"
model = "gpt-5"

5. Custom Prompts

Place .codex/prompts/playwright-test.md:

You are generating Playwright tests for example.com.

Conventions:
- tests in tests/e2e/
- page objects in src/pages/ extend BasePage
- Use getByRole, getByLabel, getByTestId
- No waitForTimeout
- TypeScript strict mode

Generate a test for the following scenarios: {input}

Invoke:

codex prompt playwright-test "signin happy path, invalid email, wrong password"

6. Workflow: Generate a Test

codex
> Generate a Playwright test for the checkout flow.
> Sign in as alice@example.com, add Widget to cart,
> complete checkout, verify confirmation.

Codex plans the work, reads existing page objects, proposes a diff for tests/e2e/checkout/place-order.spec.ts.

7. Workflow: Bug Reproduction

> Bug: applying SAVE10 discount doesn't update cart total for 2 seconds.
> Reproduce as a failing Playwright test.

8. MCP Server Integration

[mcp.playwright]
command = "npx"
args = ["@modelcontextprotocol/server-playwright"]

[mcp.postgres]
command = "npx"
args = ["@modelcontextprotocol/server-postgres", "postgresql://localhost/test"]

Restart Codex; MCP servers appear as available tools.

9. Workflow: Refactor

> Refactor src/pages/CheckoutPage.ts to use composition.
> Create PaymentForm and ShippingForm components.
> Update all tests in tests/e2e/checkout/ accordingly.

Codex plans the change, edits multiple files, runs the test suite to verify.

10. Headless Use in CI

- name: Triage failure
  if: failure()
  run: |
    codex --mode auto --prompt "Analyze the failing test in tests/e2e/checkout.spec.ts. Identify root cause." > triage.md
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
- uses: actions/upload-artifact@v4
  with: { name: triage, path: triage.md }

11. Codex CLI vs Claude Code

AspectCodex CLIClaude Code
ProviderOpenAIAnthropic
ModelGPT-5 / GPT-4.1Claude Sonnet/Opus
Sandbox modes3 explicitimplicit
MCP supportYesYes
SKILL.md supportVia custom promptsNative
CostOpenAI pricingAnthropic pricing

For QA work specifically, Claude Code typically produces marginally cleaner Playwright tests; Codex CLI is excellent for Python and pytest. Choose based on your team's preferred model.

12. Integration with QASkills

npm install -g @qaskills/cli
npx @qaskills/cli init --agent codex
npx @qaskills/cli add playwright-tests

Skills go to .codex/skills/. Reference in custom prompts.

13. Workflow: BDD Generation

> Generate features/checkout.feature and matching Java step definitions at src/test/java/steps/CheckoutSteps.java for the checkout flow.

14. Best Practices

  • Use suggest mode by default; switch to auto only for headless CI.
  • Customize prompts for repeated tasks.
  • Combine with QASkills SKILL.md for framework conventions.
  • Use MCP for browser tasks rather than asking Codex to shell out.
  • Set per-project budgets in OpenAI console.

15. Cost

OpenAI API pricing as of May 2026:

  • GPT-5: roughly $5/M input, $15/M output tokens
  • GPT-4.1: cheaper, suitable for routine tasks

Typical QA engineer monthly: $30-80.

16. Limitations

  • SKILL.md ecosystem smaller than Claude Code's QASkills directory.
  • Fewer QA-specific community guides than Claude Code.
  • Reasoning quality on Playwright TypeScript is excellent but Claude often edges it.

Conclusion

Codex CLI is OpenAI's answer to Claude Code, and in 2026 it is a serious option for QA engineers. The sandbox modes, MCP support, and GPT-5 reasoning produce a productive terminal agent. See claude-code-qa-testing-workflows-2026 for the closest alternative and gemini-cli-qa-engineers-guide for Google's offering.

Codex CLI for QA Engineers: Complete Guide 2026 | QASkills.sh