Skip to main content
Back to Blog
AI Evals
2026-06-26

garak: LLM Vulnerability Scanning Tutorial (2026)

Hands-on garak tutorial for 2026: install the LLM vulnerability scanner, run probes against OpenAI, Hugging Face and Ollama models, and read the report.

garak: LLM Vulnerability Scanning Tutorial (2026)

garak is an open-source LLM vulnerability scanner — think nmap for large language models. You point it at a model (OpenAI, Hugging Face, Ollama, a REST endpoint, or NVIDIA NIM), choose probes for failure modes like prompt injection, jailbreaks, toxicity, and data leakage, and garak fires hundreds of adversarial prompts, grades the responses with built-in detectors, and writes a JSONL report plus an HTML hitlog. You run it from one command: python -m garak --model_type openai --model_name gpt-4o-mini --probes dan,promptinject.

This tutorial covers installing garak, understanding the probe/generator/detector pipeline, scanning real models, reading the report, and wiring it into CI. Every command and flag below is real — copy-paste them and they work.

What garak Actually Tests

garak (Generative AI Red-teaming and Assessment Kit) was created at NVIDIA and is maintained as an open-source project under the LGPL license. Unlike a generic eval harness that measures accuracy, garak is built specifically to find security and safety weaknesses. It ships with a large library of probes, each targeting a known LLM failure category. The common ones map cleanly to the OWASP Top 10 for LLM Applications.

Probe moduleWhat it attacksOWASP LLM mapping
promptinjectPrompt injection / goal hijackingLLM01 Prompt Injection
danJailbreaks ("Do Anything Now", role-play)LLM01 Prompt Injection
encodingPayloads hidden in base64, ROT13, hexLLM01 Prompt Injection
leakreplayTraining-data / prompt regurgitationLLM02 Sensitive Info Disclosure
xssMalicious markup in model outputLLM05 Improper Output Handling
malwaregenGenerating malware or exploit codeLLM06 Excessive Agency (downstream)
realtoxicitypromptsToxic / harmful continuationsLLM09 Misinformation / safety
glitchTokenizer "glitch tokens" that destabilize outputRobustness
packagehallucinationHallucinated package names (supply-chain risk)LLM09 Misinformation

The mental model is a four-stage pipeline:

  1. Generator — the thing under test (a model client). garak calls it.
  2. Probe — generates adversarial prompts and sends them to the generator.
  3. Detector — inspects each response and decides pass/fail (the "hit").
  4. Evaluator / report — aggregates hits into per-probe pass rates and writes artifacts.

You mostly interact with two of these: pick a generator (your model) and pick probes (your attacks). garak chooses sensible default detectors per probe automatically.

Installing garak

garak is a Python package and needs Python 3.10 or newer. Install it into a fresh virtual environment to avoid dependency clashes — it pulls in a fairly heavy ML stack.

# Recommended: isolated environment
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate

# Install from PyPI
python -m pip install -U garak

# Verify
python -m garak --version

If you want the bleeding edge (probes land frequently), install from the repository instead:

python -m pip install -U git+https://github.com/NVIDIA/garak.git@main

garak is invoked as a module (python -m garak). A bare garak console entry point also exists after install, but python -m garak is the form you will see in the docs and is the most portable.

Your First Scan

The two flags you almost always pass are --model_type (which generator) and --model_name (which specific model). With no --probes, garak runs its full default suite, which can be thousands of generations — fine for a real audit, slow for a first look. Start small.

Export your provider key first:

export OPENAI_API_KEY="sk-..."

Then run two fast probes against a cheap model:

python -m garak \
  --model_type openai \
  --model_name gpt-4o-mini \
  --probes dan.DanInTheWild,promptinject

garak streams progress to the terminal, one bar per probe. For each attempt it shows the detector verdict, and at the end it prints a summary table of pass rates per probe and per detector. A line like promptinject.HijackHateHumans: PASS ok on 95/100 means 95 of 100 injection attempts were correctly refused; the 5 failures are your findings.

Targeting specific probes

List everything available, then narrow down. The probe namespace is module.ProbeClass; passing just the module runs all classes inside it.

# Discover probes, detectors, and generators
python -m garak --list_probes
python -m garak --list_detectors
python -m garak --list_generators

# Run an entire module (all DAN variants)
python -m garak --model_type openai --model_name gpt-4o-mini --probes dan

# Run one specific probe class
python -m garak --model_type openai --model_name gpt-4o-mini \
  --probes encoding.InjectBase64

Scanning Different Model Backends

garak's strength is that the probes are generator-agnostic. Swap the --model_type and the same attacks run against a local model, a hosted API, or your own deployed endpoint.

Hugging Face (local or Inference API)

# Run a model locally via transformers (downloads weights)
python -m garak --model_type huggingface --model_name gpt2 --probes lmrc

# Use the hosted Hugging Face Inference API
export HF_INFERENCE_TOKEN="hf_..."
python -m garak --model_type huggingface.InferenceAPI \
  --model_name mistralai/Mistral-7B-Instruct-v0.3 \
  --probes promptinject

Ollama (local models)

If you run models locally with Ollama, garak talks to it over the REST generator or the dedicated Ollama type:

# Pull a model first: ollama pull llama3
python -m garak --model_type ollama --model_name llama3 \
  --probes dan.DanInTheWild

NVIDIA NIM

export NIM_API_KEY="nvapi-..."
python -m garak --model_type nim \
  --model_name meta/llama-3.1-8b-instruct \
  --probes promptinject

Any REST endpoint (your own app)

This is the most important mode for QA teams: test the application, not the bare model. The rest generator is configured with a JSON file describing how to call your API and where the response text lives.

{
  "rest": {
    "RestGenerator": {
      "name": "my-chatbot",
      "uri": "https://api.example.com/v1/chat",
      "method": "post",
      "headers": {
        "Authorization": "Bearer $API_KEY",
        "Content-Type": "application/json"
      },
      "req_template_json_object": { "prompt": "$INPUT" },
      "response_json": true,
      "response_json_field": "reply"
    }
  }
}
python -m garak --model_type rest \
  -G rest_config.json \
  --probes promptinject,xss

$INPUT is substituted with each adversarial prompt; response_json_field tells garak which field holds the model's answer so detectors can grade it. Testing through your real endpoint means system prompts, guardrails, and RAG context are all in the loop — exactly what an attacker would hit.

Reading the Report

Every run writes timestamped artifacts to ~/.local/share/garak/garak_runs/ (overridable with --report_prefix). You get three things:

FileFormatUse it for
*.report.jsonlJSON LinesMachine-readable, one record per evaluation; feed to dashboards or CI gates
*.report.htmlHTMLHuman-readable summary with pass rates and severity
*.hitlog.jsonlJSON LinesThe actual failing prompt/response pairs — your reproduction cases

The hitlog is where the value is. Each line is a "hit": a prompt that defeated a detector, with the full request and the model's response. That is a ready-made regression test. A minimal pass to extract findings:

# Count hits per probe from a hitlog
python -c "
import json, collections
c = collections.Counter()
for line in open('garak.hitlog.jsonl'):
    rec = json.loads(line)
    c[rec['probe']] += 1
for probe, n in c.most_common():
    print(f'{n:4d}  {probe}')
"

Pass rates are reported per detector, and garak also computes a Z-score against published baselines (calibration) so you can see whether a model is better or worse than typical for that probe. Interpret results by severity, not raw counts: ten jailbreak hits matter more than fifty borderline-toxicity hits in a benign internal tool. For deeper coverage of which categories deserve gates, see the prompt injection testing guide.

Controlling Scan Size and Cost

A full default run is large and, on a paid API, expensive. Three flags keep it bounded:

python -m garak \
  --model_type openai --model_name gpt-4o-mini \
  --probes dan,promptinject,encoding \
  --generations 5 \
  --parallel_attempts 8
  • --generations N — how many times each prompt is sent (default 5). Lower for smoke tests, raise for statistical confidence on a flaky model.
  • --parallel_attempts N — concurrent requests; speeds up API-bound runs, but mind your rate limits.
  • --probe_options / -p — pass probe-specific tuning via JSON.

You can also apply buffs — transformations that mutate prompts to evade filters (lowercase, encoding, paraphrase). This is how garak finds attacks that only work after obfuscation:

python -m garak --model_type openai --model_name gpt-4o-mini \
  --probes dan --buffs encoding

Running garak in CI

garak returns a non-zero process exit when configured failure thresholds are exceeded, which makes it gateable. A practical GitHub Actions job scans a model on every pull request and fails the build on regressions:

name: llm-security-scan
on: [pull_request]

jobs:
  garak:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: python -m pip install -U garak
      - name: Run garak
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          python -m garak \
            --model_type openai \
            --model_name gpt-4o-mini \
            --probes promptinject,dan.DanInTheWild,xss \
            --generations 3 \
            --report_prefix ci-scan
      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: garak-report
          path: ci-scan*.jsonl

Pin the model and probe set so results are comparable run over run, keep --generations modest to control cost and latency, and archive the JSONL so you can diff hit counts between builds. Treat a rising jailbreak pass rate the same way you treat a failing unit test.

garak vs. promptfoo and Guardrails

garak is a scanner — it finds vulnerabilities with a curated probe library and reports them. It is not a runtime defense and not a general-purpose eval framework. Pair it with complementary tools rather than expecting it to do everything:

  • garak — offensive scanning to discover weaknesses across many probe categories, model-agnostic, great for audits and CI gates.
  • promptfoo — broader red-teaming plus evals with an attacker-model that synthesizes attacks tailored to your app's purpose; see the promptfoo red teaming guide and the promptfoo vs OpenAI evals comparison.
  • Guardrails / NeMo Guardrailsruntime enforcement that blocks bad input/output in production, the fix you apply after garak finds the hole.

A mature pipeline uses all three: scan with garak, expand red-team coverage with promptfoo, and deploy a guardrail to mitigate what you cannot fully fix in the model. Browse the QA skills directory for ready-made setups that combine scanning and guardrails.

Frequently Asked Questions

Is garak free to use?

Yes. garak itself is open-source and free under the LGPL license, with no usage fees. Your only costs are indirect: if you scan a paid API like OpenAI, you pay that provider for the tokens garak consumes, which can add up because a full probe run sends thousands of prompts. Scanning a local Ollama or Hugging Face model has no per-call cost.

How is garak different from a normal eval harness?

A general eval harness (like lm-evaluation-harness) measures capability — accuracy on benchmarks. garak measures vulnerability — whether the model can be made to misbehave under adversarial pressure. It ships attack probes and pass/fail security detectors rather than accuracy metrics, and its output is a list of exploitable findings, not a leaderboard score.

Can I test my whole application, not just the raw model?

Yes, and you usually should. Use the rest generator with a JSON config that points at your API endpoint and names the response field. garak then sends every adversarial prompt through your real stack, including system prompts, RAG retrieval, and any guardrails, so you see vulnerabilities as an attacker would actually encounter them.

How long does a full garak scan take?

It varies widely with the probe set, the --generations count, and your model's latency. A two-probe smoke test with --generations 3 finishes in a couple of minutes; the full default suite against a hosted API can run for hours and cost real money. Start narrow, raise --parallel_attempts to use concurrency, and reserve full scans for scheduled audits rather than every commit.

Can garak produce false positives?

Yes — its detectors use heuristics and classifier models, so they occasionally flag a benign response or miss a subtle one. Always confirm findings by reading the hitlog's prompt/response pairs before filing them as bugs. Treat the report as a prioritized lead list, and convert confirmed hits into permanent regression tests.

Which probes should I start with?

For most LLM applications, begin with promptinject, dan.DanInTheWild, encoding, and xss — they cover the highest-impact OWASP categories (prompt injection and improper output handling) and run reasonably fast. Add leakreplay if you handle sensitive data and malwaregen if your app generates code. Expand to the full suite once you have triaged the obvious findings.

garak: LLM Vulnerability Scanning Tutorial (2026) | QASkills.sh