In the wild.

Real captured agent sessions — lifted line-for-line, nothing paraphrased. They show the thing the homepage names: catching slop is the side effect. What slop-mop really does is steer the work, turning CI and scattered review threads into a sequenced path and handing the agent its next step. No single dramatic save. Just the same loop, ridden cleanly, thousands of times — and, at the end, the one time it got it wrong.

1 · The pile becomes a queue

Ten review comments, scattered across logic, tests, and docs. sm buff status collapses them into one ranked, categorized batch with a single next step — so the agent works a sequence instead of surfing GitHub.

ChronicChronicler — sm buff status · PR #2
$ sm buff status

🪣 sm buff status - CI Status Check
🔀 PR: #2

Buff status blocked: CI checks are clean, but unresolved PR review threads remain.
PR #2: 10 unresolved comment(s)

By category:
  • 🐛 Logic/Correctness: 7
  • 🧪 Testing: 1
  • 📚 Documentation: 1
  • 💭 General: 1

Next step: run 'sm buff inspect' to take the next review batch.

The power isn’t the catching — it’s that the work arrives pre-sorted, every cycle.

2 · It hands over the scaffolding

slop-mop doesn’t just list the threads — it writes a command pack: a scenario menu and a templated sm buff resolve for each thread, ranked by impact, waiting to be filled in.

ChronicChronicler — commands.sh · PR #2
# SCENARIO MENU — choose one per thread after investigating
# fixed_in_code            — Code addresses the feedback. Cite the commit.
# invalid_with_explanation  — Feedback is incorrect. Explain with evidence.
# no_longer_applicable     — Code changed since comment. Note what changed.
# out_of_scope_ticketed    — Valid but not this PR. File issue, link it.
# needs_human_feedback     — Need reviewer input. Uses --no-resolve.

# ── [1] PRRT_kwDOSJOA585-2xyL ──
# Category: 🐛 Logic/Correctness (impact=95)
# Location: client/src/features/history/HistoryTimelineView.tsx:170
# >> Investigate this thread, choose a scenario, replace <SCENARIO> and <YOUR_EVIDENCE>:
sm buff resolve 2 PRRT_kwDOSJOA585-2xyL --scenario <SCENARIO> --message "<YOUR_EVIDENCE>"

The agent isn’t left to invent a response format or a priority order — the rail supplies both.

3 · The rail does the steering

Four more, from real sessions across three repos — the agent narrating, in its own words, what the rail is doing for it.

4 · The loop closes green

After the fixes land and the threads are answered, the rail confirms both halves are clean — CI and review — before the agent calls it done.

ChronicChronicler — sm buff status && sm buff verify · PR #16
$ sm buff status 16 && sm buff verify 16

🪣 sm buff status - CI Status Check
🔀 PR: #16

✨ CI CLEAN · 3/3 checks passed

   ✅ integration-evidence  (8s)
   ✅ verify  (3m 18s)
   ✅ Cursor Bugbot  (5m 22s)

Buff verify clean: PR #16 has no unresolved review threads.

“Done” isn’t the agent’s opinion — it’s the rail confirming CI green and every thread resolved.

5 · When it’s wrong

slop-mop isn’t perfect, and we say so out loud. The honest part isn’t the miss — it’s what happens next.

Every panel here is a real session, captured verbatim by SpecStory and trimmed only for length — never reworded. Provenance (repo, PR, date) is stamped on each so you can check it.

Harm reduction, not a cure: a lower baseline of damage, earned one ridden loop at a time. That’s the whole claim.