agentwright¶

Chained audit pipelines with a spawned auditor and in-session verification. Run /audit-run — a headless claude -p subprocess audits a frozen snapshot, the current session independently verifies each finding and applies fixes to the live repo.

Version: 1.8.0 · Source · README

Install¶

/plugin marketplace add Joys-Dawn/toolwright
/plugin install agentwright@Joys-Dawn/toolwright

Requires Node.js ≥ 18 and claude on PATH (the auditor subprocess calls it). Zero config — a .gitignore that excludes large binaries/datasets keeps snapshots fast.

Using it¶

/audit-run                              # default pipeline on git diff (staged + unstaged)
/audit-run src/api/                     # default pipeline on a directory
/audit-run full --diff                  # named pipeline on git diff
/audit-run correctness,security src/    # ad-hoc stage list
/audit-step security src/auth/          # single stage
/audit-resume 2026-04-15-abc123         # resume an interrupted run
/audit-clean --logs-only                # keep findings, drop logs

Default pipeline (no argument): correctness → security → best-practices. Default scope: git diff (staged + unstaged).

How it runs¶

Frozen snapshot of the codebase (.gitignore-aware).
claude -p subprocess audits the snapshot using a vendored or custom skill.
Findings stream back as newline-delimited JSON.
The session verifies each finding against the live repo — auditor claims are never blindly trusted.
Objectively correct fixes apply immediately. Judgment calls are marked valid_needs_approval and presented after the run.
After all stages, the verifier agent validates applied fixes.
A per-finding summary table prints.

Pipeline rules: string entries run sequentially. Nested arrays (["a", "b"]) run as a parallel group on the same snapshot. Each new group gets a fresh snapshot of the fixed repo. Duplicate stage names auto-suffix (correctness → correctness-2).

Commands¶

All seven live under the plugin's / namespace (they're in commands/, not skills).

Command	Args	Purpose
`/audit-run`	`[pipeline\\|stages] [scope]`	Run the default or a named pipeline.
`/audit-step`	`<stage> [scope]`	Run a single stage.
`/audit-resume`	`<run-id>`	Resume from the next incomplete stage.
`/audit-status`	`[run-id]`	Run state — active/completed/pending stages, verification progress.
`/audit-stop`	`[run-id]`	Kill worker/auditor processes and mark cancelled.
`/audit-reset`	`[run-id]`	Guided deletion of a run directory.
`/audit-clean`	`[--logs-only]`	Prune retained artifacts per the retention policy.

Skills (22)¶

Auto-discovered from agentwright/skills/ and invokable as /agentwright:<name> or via the Skill tool.

Audit skills (used by the pipeline)¶

Skill	Focus
`/agentwright:correctness-audit`	Logic errors, null handling, async races, type coercion, resource leaks, N+1 queries.
`/agentwright:security-audit`	OWASP Top 10 2025, OWASP API Security Top 10 2023, CWE, GDPR, PCI-DSS.
`/agentwright:best-practices-audit`	DRY, SOLID, KISS, YAGNI, Clean Code, naming, coupling, anti-patterns.
`/agentwright:migration-audit`	PL/pgSQL: NULL traps, race conditions, missing constraints, JSONB pitfalls. Auto-triggers when a `supabase/migrations/*.sql` file is written.
`/agentwright:implementation-audit`	Roundabout solutions, unnecessary complexity, reinvented wheels, naive designs.
`/agentwright:ui-audit`	WCAG 2.2, WAI-ARIA patterns, touch target sizing, focus management, React/Tailwind anti-patterns.
`/agentwright:test-coverage-audit`	Maps source files against tests, produces a risk-prioritized list of gaps.

Planning¶

Skill	Focus
`/agentwright:feature-planning`	Impact analysis, requirements, design, implementation steps, risk assessment.
`/agentwright:project-planning`	Stack selection, directory structure, tooling, scaffolding for a new project.
`/agentwright:bug-fix-planning`	Root-cause mapping, change impact, minimal fix, regression tests.
`/agentwright:refactor-planning`	Blast radius mapping, safe transformation sequence, behavior-preservation verification.

Debugging¶

Skill	Focus
`/agentwright:systematic-debugging`	Reproduce, isolate, hypothesize, verify.

Test writing¶

Skill	Focus
`/agentwright:write-tests`	General test quality (assertions, isolation, flakiness, over-mocking). Defers to the three below when applicable.
`/agentwright:write-tests-frontend`	React components/hooks with Vitest + RTL.
`/agentwright:write-tests-deno`	Deno integration tests for Supabase Edge Functions.
`/agentwright:write-tests-pgtap`	pgTAP database tests for Supabase SQL migrations.

Agent-shortcut skills¶

Thin wrappers that invoke the built-in agents — use /agentwright:<name> instead of typing @agent-agentwright:<agent>.

Skill	Agent	Pattern
`/agentwright:research <topic>`	deep-research	Forked — self-contained topic
`/agentwright:update-docs [scope]`	update-docs	Forked — infers from git diff
`/agentwright:critique [focus]`	party-pooper	Forked — reads session transcript
`/agentwright:verify [focus]`	verifier	Forked — reads session transcript + git diff
`/agentwright:challenge [claim]`	detective (×2)	Inline — dispatches two detectives with opposing hypotheses

Utilities¶

Skill	Focus
`/agentwright:config-init`	Write `.claude/agentwright.json` with every default populated. Pass `--force` to overwrite.

Agents (5)¶

Invokable as @agent-agentwright:<name> or via the shortcut skills above.

Agent	Role	Tools
detective	Investigates a hypothesis — traces logic, reads files, runs tests, reports evidence. Backs `/agentwright:challenge`.	Read-only + research MCPs + Bash for tests
verifier	Validates applied fixes — implementations exist, tests pass, no unstated changes. Auto-dispatched after audit fixes.	Read-only + Bash for tests
deep-research	Web search and literature review. Uses Exa, Context7, AlphaXiv, Scholar Gateway, Hugging Face, PubMed, bioRxiv in parallel.	Read-only
party-pooper	Adversarial critique. Parallel counter-evidence searches across academic, web, and docs sources.	Read-only + research MCPs
update-docs	Keeps `.md` files in sync with code. Scoped by `hooks/md-only-edit.js` to `.md` files only.	`.md` files only

All agents run with permissionMode: dontAsk.

Config¶

.claude/agentwright.json (all fields optional). See agentwright.example.json.

Run /agentwright:config-init to drop the full default config into your repo — every key populated so you can edit pipelines, custom stages, and retention in place. Add --force to overwrite an existing file; delete the file to fall back to built-in defaults.

{
  "pipelines": {
    "default": ["correctness", "security", "best-practices"],
    "full": ["correctness", "security", ["best-practices", "perf"], ["my-checks", "ui"], "test-coverage"]
  },
  "customStages": {
    "perf": { "type": "skill", "skillId": "performance-investigation" },
    "my-checks": { "type": "skill", "skillPath": "skills/my-custom-audit/SKILL.md" }
  },
  "retention": {
    "keepCompletedRuns": 2,
    "deleteCompletedLogs": true,
    "deleteCompletedFindings": false,
    "maxRunAgeDays": 2
  }
}

Custom stages are referenced by their key inside pipelines (e.g., "perf" in full above), or run directly with /audit-step perf.

Key	Default	Description
`pipelines.default`	`["correctness", "security", "best-practices"]`	Pipeline for `/audit-run` with no argument.
`pipelines.<name>`	—	Named pipeline. Array of stage names or nested arrays for parallel groups.
`customStages.<key>.skillId`	—	Reference a builtin skill by ID.
`customStages.<key>.skillPath`	—	Reference a project-relative `SKILL.md`.
`retention.keepCompletedRuns`	2	Completed runs to retain.
`retention.deleteCompletedLogs`	true	Delete stage log folders for completed runs.
`retention.deleteCompletedFindings`	false	Delete per-finding JSON for completed runs.
`retention.maxRunAgeDays`	2	Prune completed runs older than this.

State¶

.claude/audit-runs/<run-id>/ holds each run:

findings/ — per-finding JSON as it streams from the auditor
logs/ — per-stage auditor subprocess logs
group-<N>-snapshot.json — path to the frozen snapshot consumed by the verifier
Run metadata for /audit-status and /audit-resume