Skip to main content
Back to Blog
Guide
2026-06-05

Testing OTP, SMS & Phone Number Flows: Complete Guide 2026

A tool-agnostic guide to testing OTP, SMS, 2FA and phone-number flows: Twilio test credentials, mock gateways, and deterministic codes with Playwright, Selenium and Robot Framework.

Testing OTP, SMS & Phone Number Flows: Complete Guide 2026

One-time passcodes delivered by SMS guard the front door of almost every modern application — signup, login, password reset, payment confirmation, and step-up authentication all hinge on a short numeric code arriving on a phone. For a human this is trivial: glance at the notification, type six digits, done. For an automated test suite it is one of the genuinely hard problems, because CI infrastructure does not hold a phone and cannot read a push notification. Yet skipping OTP coverage leaves the most security-critical path in your product untested. The solution is to make the SMS reachable to your test as data — through a programmable phone number, a mock gateway, or a deterministic code in non-production environments.

This guide is a complete, tool-agnostic playbook for testing OTP, SMS, 2FA, and phone-number flows in 2026. We cover four strategies — Twilio test credentials, a real programmable receiving number, a mock SMS gateway, and deterministic test-mode codes — and then show concrete implementations across Playwright (TypeScript), Selenium (Python), and Robot Framework, including how to poll an inbox API with the Twilio date_sent_after filter and extract the code with a regex. Every snippet is runnable. If you searched for "playwright phone number testing," "robot framework sms testing," "password reset testing robot framework," or "date_sent_after twilio python," you are in the right place.

The unifying idea is simple and worth stating up front: the SMS is just a piece of data sitting in some provider's store, and your test needs a seam to reach in and grab it. Pick the seam that matches your environment. For installable QA skills and related deep dives, browse the skills directory and the blog, especially the Twilio SMS/OTP testing in Python guide and the Robot Framework SMS OTP testing guide.

The Four Strategies for Testing OTP

There is no single right way to test an SMS code; there are four, and the best choice depends on whether you control the application's test environment and whether you need a real end-to-end path. Understanding all four lets you pick deliberately instead of defaulting to whatever you saw first.

The deterministic test-mode code approach is the cheapest and most reliable: in non-production environments, the application is configured to always issue a known code (e.g. 000000) or to expose the generated code through a test-only endpoint. No real SMS is sent. The real receiving number approach uses a programmable number (Twilio, Vonage, MessageBird) that actually receives the SMS, which your test then pulls back via the provider's API — a true end-to-end path including the real SMS provider. The mock SMS gateway approach points your application's SMS sending at a fake server you control, which records messages for the test to read. The provider test credentials approach (Twilio's magic test numbers) lets you exercise the sending code path without delivering real messages or incurring cost.

StrategyReal SMS sent?CostDeterminismBest for
Deterministic test codeNoFreeHighestOwned test envs, CI
Real receiving numberYesPer messageMedium (async)True end-to-end smoke
Mock SMS gatewayNoFreeHighOffline CI, isolation
Provider test credentialsNo (simulated)FreeHighVerifying send code path

A mature suite mixes these: deterministic codes for the bulk of functional tests, one or two real-number tests as a periodic end-to-end smoke check, and mocks for offline CI. Never test against production by triggering real codes to real users.

Twilio Test Credentials and Magic Numbers

If your application sends via Twilio, the official test credentials are the fastest way to verify the sending path without delivering messages or paying. Twilio provides a separate Test Account SID and Auth Token, plus "magic" phone numbers that trigger specific outcomes. A request made with test credentials is validated and logged but never actually sends an SMS.

# twilio_test_creds.py — verify send logic with TEST credentials
import os
from twilio.rest import Client

# These are the TEST SID/token from the Twilio console, not the live ones.
client = Client(os.environ["TWILIO_TEST_SID"], os.environ["TWILIO_TEST_TOKEN"])

# Magic 'from' number that Twilio accepts in test mode:
TEST_FROM = "+15005550006"  # valid, SMS-capable test sender

def send_otp(to: str, code: str):
    return client.messages.create(
        to=to,
        from_=TEST_FROM,
        body=f"Your code is {code}",
    )

# A magic 'to' number that simulates an invalid destination:
INVALID_TO = "+15005550001"  # Twilio returns error 21211 in test mode

Use test credentials to assert that your application attempts to send the right message with the right parameters and handles Twilio errors (invalid number, unsubscribed recipient) correctly. They do not help you read an inbound code — for that you need a real number, covered next.

Polling a Real Inbox with date_sent_after

For a genuine end-to-end test, provision a real Twilio number that can receive SMS, trigger the application to send a code to it, then poll Twilio's API for the inbound message. The critical correctness detail is timing: SMS is asynchronous, so you must record a timestamp before triggering the send and pass it as date_sent_after when listing messages. That floor guarantees you never pick up a stale code from a previous run.

# poll_otp.py — fetch a fresh OTP from a real Twilio inbox
import os, re, time
from datetime import datetime, timezone, timedelta
from twilio.rest import Client

client = Client(os.environ["TWILIO_ACCOUNT_SID"], os.environ["TWILIO_AUTH_TOKEN"])
RECEIVING_NUMBER = os.environ["TWILIO_RECEIVING_NUMBER"]  # the number under your control

def wait_for_otp(since: datetime, timeout_s: int = 60, interval_s: int = 3) -> str:
    """Poll Twilio for an inbound SMS newer than 'since' and return the 6-digit code."""
    deadline = time.time() + timeout_s
    # A tiny clock-skew cushion avoids missing a message sent at exactly 'since'.
    floor = since - timedelta(seconds=5)
    while time.time() < deadline:
        messages = client.messages.list(
            to=RECEIVING_NUMBER,
            date_sent_after=floor,
            limit=20,
        )
        for msg in messages:
            match = re.search(r"\b(\d{6})\b", msg.body or "")
            if match:
                return match.group(1)
        time.sleep(interval_s)
    raise TimeoutError("No OTP arrived within the timeout window")

# Usage:
since = datetime.now(timezone.utc)
# ... trigger the app to send the code to RECEIVING_NUMBER here ...
code = wait_for_otp(since)

Two things make this robust. First, recording since before the trigger eliminates the race where a stale message satisfies the query. Second, the regex \b(\d{6})\b extracts exactly a six-digit token; tighten or loosen the digit count to match your code format, and prefer a pattern anchored to surrounding text (e.g. code is (\d{6})) if multiple numbers can appear in the body. For a deeper Twilio-specific treatment, see the Twilio SMS/OTP testing in Python guide.

Mock SMS Gateways for Offline CI

Real numbers cost money and add network flakiness to CI. A mock SMS gateway removes both: you run a tiny HTTP server that mimics the SMS provider's send endpoint, point the application's provider configuration at it in test mode, and read back the captured messages from the mock. This keeps the whole flow offline and deterministic.

# mock_sms_server.py — minimal in-memory SMS sink (FastAPI)
from fastapi import FastAPI, Request
import re

app = FastAPI()
SENT: list[dict] = []

@app.post("/Messages")               # mimic the provider's send path
async def send(request: Request):
    form = await request.form()
    SENT.append({"to": form.get("To"), "body": form.get("Body")})
    return {"sid": f"SM{len(SENT):032d}", "status": "queued"}

@app.get("/inbox/{to}")              # test-only helper to read back
async def inbox(to: str):
    msgs = [m for m in SENT if m["to"] == to]
    return {"messages": msgs}

def latest_code(to: str) -> str | None:
    for m in reversed(SENT):
        if m["to"] == to:
            found = re.search(r"\d{6}", m["body"] or "")
            if found:
                return found.group(0)
    return None

Run this mock as a fixture, set the application's SMS base URL to http://localhost:8000 for the test run, and your tests read codes from the in-memory store instantly. The same idea applies to email OTP with a mock SMTP sink. The trade-off is fidelity: you are not exercising the real provider, so keep a thin real-number smoke test to catch provider-side regressions.

Playwright: Phone Number Flows in TypeScript

Playwright drives the browser; the OTP retrieval is a separate async call you await between UI steps. The pattern is: fill the phone field, click "send code," then call your retrieval helper (real Twilio poll or mock read), then type the code. Here is a complete password-reset flow.

// otp.spec.ts
import { test, expect } from '@playwright/test';
import { fetchLatestOtp } from './otp-helper';

test('password reset via SMS OTP', async ({ page }) => {
  const phone = process.env.TEST_PHONE!;        // the receiving number

  await page.goto('https://app.example.com/forgot-password');
  await page.getByLabel('Phone number').fill(phone);

  const sinceIso = new Date().toISOString();     // record BEFORE triggering
  await page.getByRole('button', { name: 'Send code' }).click();
  await expect(page.getByText('Code sent')).toBeVisible();

  // Retrieve the code out-of-band (Twilio poll or mock read):
  const code = await fetchLatestOtp(phone, sinceIso);

  await page.getByLabel('Verification code').fill(code);
  await page.getByRole('button', { name: 'Verify' }).click();
  await expect(page).toHaveURL(/reset-password/);
});

The retrieval helper isolates the seam so tests stay readable. Below it polls a mock inbox; swapping in a Twilio call is a one-line change.

// otp-helper.ts
export async function fetchLatestOtp(to: string, sinceIso: string): Promise<string> {
  const deadline = Date.now() + 60_000;
  while (Date.now() < deadline) {
    const res = await fetch(`http://localhost:8000/inbox/${encodeURIComponent(to)}`);
    const data = await res.json();
    for (const m of [...data.messages].reverse()) {
      const match = (m.body as string).match(/\b(\d{6})\b/);
      if (match && new Date(m.sentAt) >= new Date(sinceIso)) return match[1];
    }
    await new Promise((r) => setTimeout(r, 2000));
  }
  throw new Error('OTP did not arrive in time');
}

For more Playwright patterns, see the Playwright E2E complete guide.

Selenium: OTP Verification in Python

Selenium follows the same shape: drive the UI, retrieve the code out-of-band, type it. Reusing the wait_for_otp helper from earlier keeps the test focused on the user journey.

# test_otp_selenium.py
from datetime import datetime, timezone
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from poll_otp import wait_for_otp, RECEIVING_NUMBER

def test_login_with_sms_otp():
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver, 15)
    try:
        driver.get("https://app.example.com/login")
        driver.find_element(By.NAME, "phone").send_keys(RECEIVING_NUMBER)

        since = datetime.now(timezone.utc)            # record BEFORE clicking
        driver.find_element(By.ID, "send-code").click()
        wait.until(EC.visibility_of_element_located((By.ID, "otp-input")))

        code = wait_for_otp(since)                     # out-of-band retrieval
        driver.find_element(By.ID, "otp-input").send_keys(code)
        driver.find_element(By.ID, "verify").click()

        wait.until(EC.url_contains("/dashboard"))
        assert "/dashboard" in driver.current_url
    finally:
        driver.quit()

The explicit WebDriverWait between sending and entering the code matters: the input field may render only after the send succeeds. For broader Selenium guidance, see the Selenium Python tutorial.

Robot Framework: SMS and Password-Reset Keywords

Robot Framework shines here because the polling logic collapses into a single Wait Until Keyword Succeeds, and OTP retrieval becomes a reusable keyword. Below is a password-reset suite that retrieves the code from Twilio via a small Python library. Note Robot's variable syntax uses the dollar-brace form.

*** Settings ***
Library    SeleniumLibrary
Library    OtpLibrary.py        # wraps wait_for_otp() from earlier

*** Variables ***
${URL}        https://app.example.com/forgot-password
${PHONE}      %{TEST_PHONE}

*** Test Cases ***
Password Reset Via SMS OTP
    Open Browser    ${URL}    chrome
    Input Text      id=phone   ${PHONE}
    ${since}=       Get Current Utc Time
    Click Button    id=send-code
    Wait Until Element Is Visible    id=otp-input    timeout=15s
    ${code}=        Wait Until Keyword Succeeds    60s    3s
    ...                 Get Latest Otp    ${PHONE}    ${since}
    Input Text      id=otp-input    ${code}
    Click Button    id=verify
    Wait Until Location Contains    /reset-password
    [Teardown]      Close Browser

The supporting Python library exposes the two custom keywords Robot calls.

# OtpLibrary.py
from datetime import datetime, timezone
from poll_otp import wait_for_otp

def get_current_utc_time():
    return datetime.now(timezone.utc)

def get_latest_otp(phone: str, since):
    # Raises on timeout so Wait Until Keyword Succeeds retries cleanly.
    return wait_for_otp(since)

Wait Until Keyword Succeeds is the idiomatic Robot polling primitive: it retries Get Latest Otp every 3 seconds for up to 60 seconds, which maps perfectly onto the asynchronous nature of SMS. For more, see the Robot Framework SMS OTP testing guide.

Edge Cases and Security Considerations

OTP flows have failure modes a happy-path test will miss. Cover them deliberately. Expiry: request a code, wait past its TTL, and assert the application rejects it. Rate limiting: trigger repeated sends and assert the app throttles after the configured threshold. Wrong code: submit an incorrect code and assert a clear error without leaking whether the phone number exists. Resend: assert a resend invalidates the previous code (or does not, per spec) and that you read the newest message. Replay: assert a consumed code cannot be reused.

Edge caseWhat to assert
ExpiryCode past TTL is rejected
Rate limitingSends throttled after N attempts
Wrong codeGeneric error, no account enumeration
ResendNewest code is the valid one
ReplayUsed code cannot be reused

On security: never log real OTPs in CI output, scope test credentials tightly, and keep a dedicated receiving number (or pool) so concurrent runs do not read each other's codes. Treat the receiving number's credentials as secrets in your CI vault, exactly as you would any production token.

Email OTP and Magic Links

Not every one-time code arrives by SMS — email OTPs and magic links are just as common, and the testing pattern is structurally identical: the code or link is data sitting in an inbox your test can reach. The two practical approaches mirror the SMS strategies. Use a real test inbox with an IMAP/API interface (Gmail with an app password, or a disposable-inbox service like Mailosaur or MailSlurp), or run a mock SMTP sink that captures outgoing mail in memory.

# email_otp.py — fetch a code from a real inbox over IMAP
import imaplib, email, re, time

def wait_for_email_code(host, user, password, subject_contains, timeout_s=60):
    deadline = time.time() + timeout_s
    while time.time() < deadline:
        box = imaplib.IMAP4_SSL(host)
        box.login(user, password)
        box.select("INBOX")
        _, ids = box.search(None, f'(SUBJECT "{subject_contains}" UNSEEN)')
        for msg_id in ids[0].split():
            _, data = box.fetch(msg_id, "(RFC822)")
            msg = email.message_from_bytes(data[0][1])
            body = _plain_text(msg)
            match = re.search(r"\b(\d{6})\b", body)
            if match:
                box.store(msg_id, "+FLAGS", "\\Seen")  # mark read to avoid re-reads
                return match.group(1)
        box.logout()
        time.sleep(3)
    raise TimeoutError("No email code arrived")

def _plain_text(msg):
    if msg.is_multipart():
        for part in msg.walk():
            if part.get_content_type() == "text/plain":
                return part.get_payload(decode=True).decode(errors="ignore")
    return msg.get_payload(decode=True).decode(errors="ignore")

For magic links, extract the URL with a regex instead of a numeric code, then navigate to it directly in Playwright or Selenium to complete authentication. Marking each processed message as read (the +FLAGS \Seen step) is the email equivalent of the SMS date_sent_after floor: it prevents a stale message from a previous run satisfying the next query.

FlowWhere the secret livesHow the test extracts it
SMS OTPProvider inbox (Twilio)Poll API, regex digits
Email OTPIMAP/API inboxSearch subject, regex digits
Magic linkEmail bodyRegex the URL, navigate to it
Authenticator (TOTP)Shared secretCompute code from secret

Testing TOTP-Based 2FA

Authenticator-app 2FA (Google Authenticator, Authy) uses TOTP — a time-based code derived from a shared secret and the current clock, with no SMS or email involved at all. This is actually the easiest 2FA to automate, because if your test knows the shared secret it can compute the exact same six-digit code the authenticator app would show, deterministically and instantly. No polling, no inbox.

# totp.py — compute the current 2FA code from the shared secret
import pyotp

# The base32 secret shown (or encoded in the QR) at enrollment time.
# In tests, enroll a known test account and store this as a secret.
TOTP_SECRET = "JBSWY3DPEHPK3PXP"

def current_2fa_code() -> str:
    return pyotp.TOTP(TOTP_SECRET).now()
// totp.ts — same idea in TypeScript with otplib, used in a Playwright test
import { authenticator } from 'otplib';

export function current2faCode(): string {
  return authenticator.generate(process.env.TOTP_SECRET!);
}

// In a test:
// await page.getByLabel('Authenticator code').fill(current2faCode());

Because the code is computed from the secret and the clock, TOTP tests are fully deterministic and need no external service. The one caveat is clock skew: if the CI runner's clock drifts, the code may be rejected — keep runners time-synced, and prefer enrolling a dedicated test account whose secret you control rather than reusing a human's. This makes authenticator-based 2FA the most CI-friendly second factor of all.

Frequently Asked Questions

How do I test SMS OTP without a real phone?

Make the code reachable as data instead of a notification. The most reliable approach is a deterministic test-mode code (the app issues a known code like 000000 in non-production), but you can also use a programmable receiving number whose inbox you poll via API, or a mock SMS gateway that captures messages your test reads back. Each removes the need for a physical phone in CI.

What does date_sent_after do in the Twilio API?

date_sent_after filters the message list to only messages sent after a given timestamp, which is essential for OTP tests. Record a timestamp before triggering the send and pass it as the floor, so you never pick up a stale code from a previous run. Add a small clock-skew cushion of a few seconds to avoid missing a message sent at exactly that moment.

How do I test a password reset flow in Robot Framework?

Drive the UI with SeleniumLibrary, record a timestamp before clicking "send code," then use Wait Until Keyword Succeeds to poll a custom Get Latest Otp keyword that retrieves the code (from Twilio or a mock). Input the code, submit, and assert the URL advances to the reset-password page. The retry primitive maps cleanly onto the asynchronous arrival of the SMS.

Can I test phone number flows with Playwright?

Yes. Playwright drives the browser, and OTP retrieval is an out-of-band async call you await between UI steps. Fill the phone field, click send, then await a helper that polls a real inbox or a mock, then type the returned code and verify. Keep the retrieval logic in a separate helper so swapping a mock for a real Twilio poll is a one-line change.

Should I use real SMS in CI?

Mostly no. Run the bulk of your functional tests with deterministic test-mode codes or a mock gateway so CI stays fast, free, and deterministic. Keep one or two real-number end-to-end smoke tests on a slower cadence to catch provider-side regressions. Never trigger real codes to real users, and never test OTP flows against production.

How do I extract the OTP code from a message body?

Use a regex tuned to your code format. A six-digit token is \b(\d{6})\b, but if other numbers can appear in the body, anchor to surrounding text such as code is (\d{6}) to avoid matching the wrong number. Always read the newest matching message and confirm it arrived after your recorded send timestamp before accepting it.

How do I test OTP expiry and rate limiting?

For expiry, request a code, wait past its configured TTL, submit it, and assert the application rejects it. For rate limiting, trigger repeated sends in quick succession and assert the app throttles after the configured threshold, returning a clear error. Both are easy to test deterministically with a mock gateway because you control timing and can fast-forward by adjusting the app's clock in test mode.

Conclusion

Testing OTP, SMS, and phone-number flows comes down to choosing the right seam to reach the code: a deterministic test-mode value for everyday functional coverage, a real programmable number for true end-to-end smoke checks, a mock gateway for fast offline CI, and provider test credentials for verifying the send path. Once you have the seam, the framework barely matters — Playwright, Selenium, and Robot Framework all follow the same shape: drive the UI, retrieve the code out-of-band, type it, and assert the outcome. Always record your timestamp before triggering the send, extract the code with a tight regex, and cover expiry, rate limiting, and replay.

Wire one of these patterns into your suite and the most security-critical path in your product stops being a manual step. For installable, agent-ready QA skills, see the skills directory; for deeper, tool-specific walkthroughs, continue with the Twilio SMS/OTP testing in Python guide and the Robot Framework SMS OTP testing guide on the blog.

Testing OTP, SMS & Phone Number Flows: Complete Guide 2026 | QASkills.sh