agentwright¶
Chained audit pipelines with a spawned auditor and in-session verification. Run
/audit-run— a headlessclaude -psubprocess audits a frozen snapshot, the current session independently verifies each finding and applies fixes to the live repo.
Version: 1.8.0 · Source · README
Install¶
Requires Node.js ≥ 18 and claude on PATH (the auditor subprocess calls it). Zero config — a .gitignore that excludes large binaries/datasets keeps snapshots fast.
Using it¶
/audit-run # default pipeline on git diff (staged + unstaged)
/audit-run src/api/ # default pipeline on a directory
/audit-run full --diff # named pipeline on git diff
/audit-run correctness,security src/ # ad-hoc stage list
/audit-step security src/auth/ # single stage
/audit-resume 2026-04-15-abc123 # resume an interrupted run
/audit-clean --logs-only # keep findings, drop logs
Default pipeline (no argument): correctness → security → best-practices.
Default scope: git diff (staged + unstaged).
How it runs¶
- Frozen snapshot of the codebase (
.gitignore-aware). claude -psubprocess audits the snapshot using a vendored or custom skill.- Findings stream back as newline-delimited JSON.
- The session verifies each finding against the live repo — auditor claims are never blindly trusted.
- Objectively correct fixes apply immediately. Judgment calls are marked
valid_needs_approvaland presented after the run. - After all stages, the verifier agent validates applied fixes.
- A per-finding summary table prints.
Pipeline rules: string entries run sequentially. Nested arrays (["a", "b"]) run as a parallel group on the same snapshot. Each new group gets a fresh snapshot of the fixed repo. Duplicate stage names auto-suffix (correctness → correctness-2).
Commands¶
All seven live under the plugin's / namespace (they're in commands/, not skills).
| Command | Args | Purpose |
|---|---|---|
/audit-run |
[pipeline\|stages] [scope] |
Run the default or a named pipeline. |
/audit-step |
<stage> [scope] |
Run a single stage. |
/audit-resume |
<run-id> |
Resume from the next incomplete stage. |
/audit-status |
[run-id] |
Run state — active/completed/pending stages, verification progress. |
/audit-stop |
[run-id] |
Kill worker/auditor processes and mark cancelled. |
/audit-reset |
[run-id] |
Guided deletion of a run directory. |
/audit-clean |
[--logs-only] |
Prune retained artifacts per the retention policy. |
Skills (22)¶
Auto-discovered from agentwright/skills/ and invokable as /agentwright:<name> or via the Skill tool.
Audit skills (used by the pipeline)¶
| Skill | Focus |
|---|---|
/agentwright:correctness-audit |
Logic errors, null handling, async races, type coercion, resource leaks, N+1 queries. |
/agentwright:security-audit |
OWASP Top 10 2025, OWASP API Security Top 10 2023, CWE, GDPR, PCI-DSS. |
/agentwright:best-practices-audit |
DRY, SOLID, KISS, YAGNI, Clean Code, naming, coupling, anti-patterns. |
/agentwright:migration-audit |
PL/pgSQL: NULL traps, race conditions, missing constraints, JSONB pitfalls. Auto-triggers when a supabase/migrations/*.sql file is written. |
/agentwright:implementation-audit |
Roundabout solutions, unnecessary complexity, reinvented wheels, naive designs. |
/agentwright:ui-audit |
WCAG 2.2, WAI-ARIA patterns, touch target sizing, focus management, React/Tailwind anti-patterns. |
/agentwright:test-coverage-audit |
Maps source files against tests, produces a risk-prioritized list of gaps. |
Planning¶
| Skill | Focus |
|---|---|
/agentwright:feature-planning |
Impact analysis, requirements, design, implementation steps, risk assessment. |
/agentwright:project-planning |
Stack selection, directory structure, tooling, scaffolding for a new project. |
/agentwright:bug-fix-planning |
Root-cause mapping, change impact, minimal fix, regression tests. |
/agentwright:refactor-planning |
Blast radius mapping, safe transformation sequence, behavior-preservation verification. |
Debugging¶
| Skill | Focus |
|---|---|
/agentwright:systematic-debugging |
Reproduce, isolate, hypothesize, verify. |
Test writing¶
| Skill | Focus |
|---|---|
/agentwright:write-tests |
General test quality (assertions, isolation, flakiness, over-mocking). Defers to the three below when applicable. |
/agentwright:write-tests-frontend |
React components/hooks with Vitest + RTL. |
/agentwright:write-tests-deno |
Deno integration tests for Supabase Edge Functions. |
/agentwright:write-tests-pgtap |
pgTAP database tests for Supabase SQL migrations. |
Agent-shortcut skills¶
Thin wrappers that invoke the built-in agents — use /agentwright:<name> instead of typing @agent-agentwright:<agent>.
| Skill | Agent | Pattern |
|---|---|---|
/agentwright:research <topic> |
deep-research | Forked — self-contained topic |
/agentwright:update-docs [scope] |
update-docs | Forked — infers from git diff |
/agentwright:critique [focus] |
party-pooper | Forked — reads session transcript |
/agentwright:verify [focus] |
verifier | Forked — reads session transcript + git diff |
/agentwright:challenge [claim] |
detective (×2) | Inline — dispatches two detectives with opposing hypotheses |
Utilities¶
| Skill | Focus |
|---|---|
/agentwright:config-init |
Write .claude/agentwright.json with every default populated. Pass --force to overwrite. |
Agents (5)¶
Invokable as @agent-agentwright:<name> or via the shortcut skills above.
| Agent | Role | Tools |
|---|---|---|
| detective | Investigates a hypothesis — traces logic, reads files, runs tests, reports evidence. Backs /agentwright:challenge. |
Read-only + research MCPs + Bash for tests |
| verifier | Validates applied fixes — implementations exist, tests pass, no unstated changes. Auto-dispatched after audit fixes. | Read-only + Bash for tests |
| deep-research | Web search and literature review. Uses Exa, Context7, AlphaXiv, Scholar Gateway, Hugging Face, PubMed, bioRxiv in parallel. | Read-only |
| party-pooper | Adversarial critique. Parallel counter-evidence searches across academic, web, and docs sources. | Read-only + research MCPs |
| update-docs | Keeps .md files in sync with code. Scoped by hooks/md-only-edit.js to .md files only. |
.md files only |
All agents run with permissionMode: dontAsk.
Config¶
.claude/agentwright.json (all fields optional). See agentwright.example.json.
Run /agentwright:config-init to drop the full default config into your repo — every key populated so you can edit pipelines, custom stages, and retention in place. Add --force to overwrite an existing file; delete the file to fall back to built-in defaults.
{
"pipelines": {
"default": ["correctness", "security", "best-practices"],
"full": ["correctness", "security", ["best-practices", "perf"], ["my-checks", "ui"], "test-coverage"]
},
"customStages": {
"perf": { "type": "skill", "skillId": "performance-investigation" },
"my-checks": { "type": "skill", "skillPath": "skills/my-custom-audit/SKILL.md" }
},
"retention": {
"keepCompletedRuns": 2,
"deleteCompletedLogs": true,
"deleteCompletedFindings": false,
"maxRunAgeDays": 2
}
}
Custom stages are referenced by their key inside pipelines (e.g., "perf" in full above), or run directly with /audit-step perf.
| Key | Default | Description |
|---|---|---|
pipelines.default |
["correctness", "security", "best-practices"] |
Pipeline for /audit-run with no argument. |
pipelines.<name> |
— | Named pipeline. Array of stage names or nested arrays for parallel groups. |
customStages.<key>.skillId |
— | Reference a builtin skill by ID. |
customStages.<key>.skillPath |
— | Reference a project-relative SKILL.md. |
retention.keepCompletedRuns |
2 | Completed runs to retain. |
retention.deleteCompletedLogs |
true | Delete stage log folders for completed runs. |
retention.deleteCompletedFindings |
false | Delete per-finding JSON for completed runs. |
retention.maxRunAgeDays |
2 | Prune completed runs older than this. |
State¶
.claude/audit-runs/<run-id>/ holds each run:
findings/— per-finding JSON as it streams from the auditorlogs/— per-stage auditor subprocess logsgroup-<N>-snapshot.json— path to the frozen snapshot consumed by the verifier- Run metadata for
/audit-statusand/audit-resume