Skip to main content

Methodology

The full disclosure.

Slopsleuth is positioned as a literary fiction–aware AI auditor. To be a legitimately useful tool, it needs to be honest about what it actually measures, what it has actually been calibrated against, and where its limits are. This page is that disclosure. It will be updated as the calibration set expands and new audits are added.

Last updated: 2026-05-02. Current calibration phase: Phase 1 — Initial public-domain validation.


1.  What Slopsleuth actually is

Slopsleuth is not a machine-learning model. It does not use perplexity, embeddings, transformer features, watermark detection, or any opaque statistical model. Anyone telling you their AI detector uses such things should be taken at face value, but should also be asked to publish their calibration data — most do not.

Slopsleuth is five hand-tuned pattern audits with named, documented thresholds. Every audit:

The reason for this design is straightforward: literary fiction has stylistic conventions that fool generic perplexity detectors (minimalism, declarative fragments, controlled repetition). A black-box ML detector cannot defend a false positive. A regex with a named pattern and a calibration threshold can.


2.  Current calibration set (Phase 1)

Slopsleuth's thresholds were initially tuned against the following samples. This is a starting point, not a complete calibration set. The set is being expanded — see Section 6 for the roadmap.

Human samples (4 novels)

Novel Year Words Score
The Red Badge of Courage — Stephen Crane 1895 46,094 0.0 / 100
Winesburg, Ohio — Sherwood Anderson 1919 74,120 0.0 / 100
The Great Gatsby — F. Scott Fitzgerald 1925 48,196 0.0 / 100
The Sun Also Rises — Ernest Hemingway 1926 67,898 0.0 / 100

All four samples are full novels in the public domain in the United States. All are pre-1930 minimalist or declarative prose. The set is drawn from this era because (a) the texts are public domain and freely reproducible for calibration documentation, and (b) the styles represent the type of literary voice most often misclassified by generic AI detectors.

AI samples (1 baseline)

Sample Source Words Score
Cold-prompted thriller excerpt Claude Opus 4.7, Apr 2026 1,636 23.0 / 100
Honest caveat: a single 1,636-word AI sample is not a sufficient AI baseline. The 23.0 score is the calibration anchor for the AI side, but a different prompt or a different model could produce different numbers. Phase 2 (in progress) is dramatically expanding the AI sample set across multiple models and prompt strategies. Until then, treat the AI baseline as illustrative, not definitive.

3.  The five audits

3.1  Not-X fragments

Sentence-fragment hedges of the form "Not because she was brave." — a documented AI-prose tic. We measure occurrences per 1,000 words. Threshold: < 2.00 acceptable, 2.00–4.00 watch, > 4.00 load.

Caveat: this audit was strongly diagnostic against earlier-generation models and is increasingly obsolete against current ones. Modern models prompted to avoid this construction succeed at avoiding it. Treated as a supporting signal, not a primary one.

3.2  Body-part clusters

The same body part (chest, jaw, throat, ribs, shoulders, spine, neck, temple, stomach, gut, belly, lungs) appearing in 2+ paragraphs within a 10-paragraph window. Per-10K-words rate. Threshold: < 2.00 acceptable, 2.00–5.00 watch, > 5.00 load.

Caveat: action scenes legitimately cluster body parts. The body-cluster audit is excluded from the hybrid-manuscript bimodal check (Audit 5) for this reason. The body-part list is hand-curated and English-only; translated literature may either over- or under-trigger.

3.3  Dialogue texture

Percentage of dialogue paragraphs containing at least one texture marker (filler, restart, interjection, ellipsis, interrupt, opener, mishearing, hedge). Threshold: > 4.00% acceptable, 1.00–4.00% watch, < 1.00% load. Manuscripts with fewer than 50 dialogue paragraphs are capped at "watch" because the percentage is statistically unstable below that sample size.

Caveat: pre-1980 prose conventions (Hemingway, Fitzgerald) and heavily-edited contemporary literary fiction can exhibit zero "um/uh/er" fillers. The audit detects this case and notes it as informational, but cannot reliably distinguish "carefully edited human dialogue" from "AI-sanitized dialogue." This is a known ambiguity.

3.4  Stock AI-phrasings

Two constructions that show large separation between AI fiction and human prose in our calibration:

Combined per-1,000-words rate. Threshold: < 0.50 acceptable, 0.50–2.00 watch, > 2.00 load. In our calibration, the AI thriller sample fired at 4.27/1,000 — Hemingway sat at 0.25/1,000.

Caveat: this is the strongest individual signal in the suite. It is also the easiest for an AI user to evade — once a writer knows we flag these constructions, they can find-and-replace them out. Slopsleuth is a triage tool, not a defense against deliberate adversarial editing.

3.5  Voice variance (per chapter)

The hybrid-manuscript detector. Runs the four base audits on each chapter independently. If a manuscript has 2+ chapters firing LOAD and 2+ chapters firing ACCEPTABLE on the same audit, that's a bimodal signal — consistent with AI-generated chapters interleaved with human-written chapters.

Caveat: deliberately multi-voice novels (Cloud Atlas, A Visit From the Goon Squad, Gone Girl, Lincoln in the Bardo) will trigger this audit by design. The audit cannot distinguish "deliberately multi-POV" from "hybrid AI/human." Use the per-chapter breakdown to make that call yourself.


4.  What Slopsleuth doesn't do

The honest list of current limitations:


5.  How to use Slopsleuth responsibly


6.  Roadmap

The state of Slopsleuth's calibration today is "useful starting point, not finished product." Honest version of where we're going:

Phase 2 — Expand the calibration set

Phase 3 — Address the AI-edited-prose gap

Phase 4 — Operationalize


7.  Privacy: how an uploaded manuscript is handled

An honest description of what happens to an uploaded file, end to end:

  1. Your browser uploads the file to the Slopsleuth backend over HTTPS.
  2. The backend stages the file in a Cloud Run container's local /tmp directory only long enough for python-docx (or the text reader) to parse it into memory — typically milliseconds.
  3. The temp file is then deleted from disk before any audit code runs. From this point on, all five audits operate on the in-memory parsed manuscript only.
  4. The Cloud Run container is ephemeral and is recycled when idle; nothing persists across requests by design.
  5. We do not log, store, transmit, retain, or train on any uploaded manuscript text. We don't run any models, so there's nothing to train.

The audit Report you see in your browser is the only persistent artifact, and it lives only in your browser session — we do not retain reports server-side either, except for billing-side records (which contain a count of audits run, not the manuscript content). See the Privacy Policy for the full legal framing.


8.  Disagree? Find a bug? Want to contribute calibration data?

Slopsleuth is built on the bet that transparency wins. If you've found a false positive on your own work, an obvious tic we're missing, a threshold that doesn't generalize to your genre, or a manuscript you think we should add to the calibration set — we want to hear from you.

Email: ezpandaofficial@gmail.com  ·  Subject line: Methodology / Calibration feedback

Closing principle. Every AI-detection product on the market has a marketing page. Most do not have a methodology page. The reason most don't is that opaque ML models cannot be honestly documented. We can be honestly documented because we are honestly simple — five named patterns, hand-tuned thresholds, transparent findings. The tradeoff is real: we will not catch what an opaque ML detector might catch. The upside is that you can read this page, read the source code, and decide for yourself whether the tool is worth using on your manuscripts.