Methodology

The full disclosure.

Slopsleuth is positioned as a literary fiction–aware AI auditor. To be a legitimately useful tool, it needs to be honest about what it actually measures, what it has actually been calibrated against, and where its limits are. This page is that disclosure. It will be updated as the calibration set expands and new audits are added.

Last updated: 2026-05-02. Current calibration phase: Phase 1 — Initial public-domain validation.

1. What Slopsleuth actually is

Slopsleuth is not a machine-learning model. It does not use perplexity, embeddings, transformer features, watermark detection, or any opaque statistical model. Anyone telling you their AI detector uses such things should be taken at face value, but should also be asked to publish their calibration data — most do not.

Slopsleuth is five hand-tuned pattern audits with named, documented thresholds. Every audit:

Names exactly which pattern it looks for.
Quotes every passage that triggered it.
Reports its threshold ranges so you can sanity-check the verdict.
Is open to public scrutiny — the source code is the calibration evidence.

The reason for this design is straightforward: literary fiction has stylistic conventions that fool generic perplexity detectors (minimalism, declarative fragments, controlled repetition). A black-box ML detector cannot defend a false positive. A regex with a named pattern and a calibration threshold can.

2. Current calibration set (Phase 1)

Slopsleuth's thresholds were initially tuned against the following samples. This is a starting point, not a complete calibration set. The set is being expanded — see Section 6 for the roadmap.

Human samples (4 novels)

Novel	Year	Words	Score
The Red Badge of Courage — Stephen Crane	1895	46,094	0.0 / 100
Winesburg, Ohio — Sherwood Anderson	1919	74,120	0.0 / 100
The Great Gatsby — F. Scott Fitzgerald	1925	48,196	0.0 / 100
The Sun Also Rises — Ernest Hemingway	1926	67,898	0.0 / 100

All four samples are full novels in the public domain in the United States. All are pre-1930 minimalist or declarative prose. The set is drawn from this era because (a) the texts are public domain and freely reproducible for calibration documentation, and (b) the styles represent the type of literary voice most often misclassified by generic AI detectors.

AI samples (1 baseline)

Sample	Source	Words	Score
Cold-prompted thriller excerpt	Claude Opus 4.7, Apr 2026	1,636	23.0 / 100

Honest caveat: a single 1,636-word AI sample is not a sufficient AI baseline. The 23.0 score is the calibration anchor for the AI side, but a different prompt or a different model could produce different numbers. Phase 2 (in progress) is dramatically expanding the AI sample set across multiple models and prompt strategies. Until then, treat the AI baseline as illustrative, not definitive.

3. The five audits

3.1 Not-X fragments

Sentence-fragment hedges of the form "Not because she was brave." — a documented AI-prose tic. We measure occurrences per 1,000 words. Threshold: < 2.00 acceptable, 2.00–4.00 watch, > 4.00 load.

Caveat: this audit was strongly diagnostic against earlier-generation models and is increasingly obsolete against current ones. Modern models prompted to avoid this construction succeed at avoiding it. Treated as a supporting signal, not a primary one.

3.2 Body-part clusters

The same body part (chest, jaw, throat, ribs, shoulders, spine, neck, temple, stomach, gut, belly, lungs) appearing in 2+ paragraphs within a 10-paragraph window. Per-10K-words rate. Threshold: < 2.00 acceptable, 2.00–5.00 watch, > 5.00 load.

Caveat: action scenes legitimately cluster body parts. The body-cluster audit is excluded from the hybrid-manuscript bimodal check (Audit 5) for this reason. The body-part list is hand-curated and English-only; translated literature may either over- or under-trigger.

3.3 Dialogue texture

Percentage of dialogue paragraphs containing at least one texture marker (filler, restart, interjection, ellipsis, interrupt, opener, mishearing, hedge). Threshold: > 4.00% acceptable, 1.00–4.00% watch, < 1.00% load. Manuscripts with fewer than 50 dialogue paragraphs are capped at "watch" because the percentage is statistically unstable below that sample size.

Caveat: pre-1980 prose conventions (Hemingway, Fitzgerald) and heavily-edited contemporary literary fiction can exhibit zero "um/uh/er" fillers. The audit detects this case and notes it as informational, but cannot reliably distinguish "carefully edited human dialogue" from "AI-sanitized dialogue." This is a known ambiguity.

3.4 Stock AI-phrasings

Two constructions that show large separation between AI fiction and human prose in our calibration:

"the way [X] [verb]" — e.g., "the way grief lives in objects"
"a/the [adj?] kind of [noun]" — e.g., "a particular kind of dead"

Combined per-1,000-words rate. Threshold: < 0.50 acceptable, 0.50–2.00 watch, > 2.00 load. In our calibration, the AI thriller sample fired at 4.27/1,000 — Hemingway sat at 0.25/1,000.

Caveat: this is the strongest individual signal in the suite. It is also the easiest for an AI user to evade — once a writer knows we flag these constructions, they can find-and-replace them out. Slopsleuth is a triage tool, not a defense against deliberate adversarial editing.

3.5 Voice variance (per chapter)

The hybrid-manuscript detector. Runs the four base audits on each chapter independently. If a manuscript has 2+ chapters firing LOAD and 2+ chapters firing ACCEPTABLE on the same audit, that's a bimodal signal — consistent with AI-generated chapters interleaved with human-written chapters.

Caveat: deliberately multi-voice novels (Cloud Atlas, A Visit From the Goon Squad, Gone Girl, Lincoln in the Bardo) will trigger this audit by design. The audit cannot distinguish "deliberately multi-POV" from "hybrid AI/human." Use the per-chapter breakdown to make that call yourself.

4. What Slopsleuth doesn't do

The honest list of current limitations:

Doesn't render an authorship verdict. Every audit reports patterns and rates. The interpretation is yours. We will never tell you "this is AI-generated."
Doesn't reliably detect AI-edited human prose. If a human writer drafts a manuscript and uses Claude or GPT to rewrite individual paragraphs, the manuscript's overall rate metrics will look human and the bimodal detector won't fire. This is the most common real-world threat in 2026 literary submissions and we don't currently address it.
Is gameable by adversarial editors. Because every audit names its pattern and quotes triggered passages, a savvy AI user can edit those patterns out and re-run until they score 0. We accept this trade-off in exchange for transparency. Slopsleuth is a triage tool, not an arms race.
Is English-only. All patterns are tuned against English-language fiction. Translated literature (where translators normalize stylistic conventions) may behave unpredictably.
Has no demographic representation in the human calibration set. The four current calibration novels are by male American/British authors writing 1895–1926. Contemporary, women, non-white, and non-Anglo writers are not represented in the baseline. If a manuscript trips the audit and the writer falls outside our calibration cohort, the score is less meaningful. Phase 2 is addressing this.
Doesn't detect watermarks. Some commercial models are starting to embed cryptographic watermarks in their outputs. We don't decode any of them.
Is calibrated against one AI model and one prompt strategy. The 23.0/100 baseline is anchored to a single 1,636-word Claude Opus 4.7 thriller. A GPT-5 sample, a Gemini sample, or an adversarially-prompted sample could score differently.

5. How to use Slopsleuth responsibly

Treat it as a conversation-starter, not a verdict. A high score means "look at the flagged passages and decide for yourself." A low score means "the easy AI tics aren't here" — not "this is definitely human."
Read the flagged passages. A finding is real evidence. A score is a summary. Always verify findings against the actual prose before drawing a conclusion.
Compare to known work. If you're an agent evaluating a manuscript that scores 15/100, run another piece by the same author through Slopsleuth. The relative score is more meaningful than the absolute one.
Never use Slopsleuth output to publicly accuse anyone. Our Acceptable Use Policy prohibits this and we mean it. Statistical pattern density is not proof of authorship.

6. Roadmap

The state of Slopsleuth's calibration today is "useful starting point, not finished product." Honest version of where we're going:

Phase 2 — Expand the calibration set

30–50 contemporary novels added to the human sample set, across genres and demographics.
20+ AI samples across multiple models (Claude family, GPT-5, Gemini, Llama, Mistral) and prompt strategies.
Adversarial samples: AI prose where the user has actively tried to evade detection.
Hybrid samples: human-drafted, AI-edited at the paragraph level.
Threshold re-tuning against the expanded set, replacing hand-picked numbers with ROC-curve-derived values.
Public confusion matrix and precision/recall per audit, published here.

Phase 3 — Address the AI-edited-prose gap

Sentence-level voice-flatness audit (low variance in sentence rhythm suggests heavy AI smoothing).
Adversarial-edit detection (overcompensation patterns).

Phase 4 — Operationalize

Calibration as a continuous-integration job — re-validates every model release.
Versioned thresholds and audit logic, with changelog visible to subscribers.
Multi-language support (translated literature first, native non-English later).

7. Privacy: how an uploaded manuscript is handled

An honest description of what happens to an uploaded file, end to end:

Your browser uploads the file to the Slopsleuth backend over HTTPS.
The backend stages the file in a Cloud Run container's local /tmp directory only long enough for python-docx (or the text reader) to parse it into memory — typically milliseconds.
The temp file is then deleted from disk before any audit code runs. From this point on, all five audits operate on the in-memory parsed manuscript only.
The Cloud Run container is ephemeral and is recycled when idle; nothing persists across requests by design.
We do not log, store, transmit, retain, or train on any uploaded manuscript text. We don't run any models, so there's nothing to train.

The audit Report you see in your browser is the only persistent artifact, and it lives only in your browser session — we do not retain reports server-side either, except for billing-side records (which contain a count of audits run, not the manuscript content). See the Privacy Policy for the full legal framing.

8. Disagree? Find a bug? Want to contribute calibration data?

Slopsleuth is built on the bet that transparency wins. If you've found a false positive on your own work, an obvious tic we're missing, a threshold that doesn't generalize to your genre, or a manuscript you think we should add to the calibration set — we want to hear from you.

Email: ezpandaofficial@gmail.com · Subject line: Methodology / Calibration feedback

Closing principle. Every AI-detection product on the market has a marketing page. Most do not have a methodology page. The reason most don't is that opaque ML models cannot be honestly documented. We can be honestly documented because we are honestly simple — five named patterns, hand-tuned thresholds, transparent findings. The tradeoff is real: we will not catch what an opaque ML detector might catch. The upside is that you can read this page, read the source code, and decide for yourself whether the tool is worth using on your manuscripts.