Skip to main content
All posts
6 min read

Why Hemingway gets flagged as AI.

And what a literary-specific detector finds instead.

A confession. We almost shipped Slopsleuth without testing it on Hemingway. When we finally did — a routine validation pass on the calibration baselines — the result was nearly fatal to our marketing claim.

The Sun Also Rises scored 23 / 100 on Slopsleuth. That's the same tier as a Claude-generated thriller we wrote cold for benchmarking. If we'd shipped that, the first reviewer to run a Hemingway novel through our tool would have torn us apart on Twitter.

So we dug in. The result is the most useful thing we've learned about building literary-specific AI detection: which signals matter, which lie, and what changes when you calibrate for fiction instead of college essays.

The setup

We ran The Sun Also Rises (Project Gutenberg public-domain text, ~67,000 words) through Slopsleuth's five audits. The verdict came back SIGNIFICANT SIGNALS — 23 / 100. Two audits flagged: dialogue texture (LOAD) and voice variance (WATCH).

"It was a good fight. Not bad. Just enough. He had not expected the boy to fight at all."

Sentences like that one. Short. Declarative. Negative-form fragments ("Not bad."). Almost no um, uh, or hedging in dialogue. To a perplexity-based detector trained on average internet text, this looks exactly like AI prose. Hemingway's signature minimalism is statistically indistinguishable from a chatbot's hedging.

The diagnosis

The dialogue-texture audit fired LOAD because of a rule we'd added specifically to catch contemporary AI prose: zero fillers (um/uh/er) across a manuscript with 100+ dialogue paragraphs escalates the verdict. That rule made sense for 2026 fiction — modern dialogue uses fillers naturally, so their total absence is a strong AI-sanitization signal.

But pre-1980s literary prose simply doesn't use those fillers. It's a stylistic convention that emerged later. Fitzgerald doesn't write "um." Anderson doesn't write "uh." Hemingway certainly doesn't. The rule was correct for our calibration baseline (a 2026 thriller) but produced false positives across an entire era of literature.

The fix

We removed the auto-escalation. Now the zero-filler observation is captured as informational metadata in the report — "common in pre-1980 prose; can also indicate AI sanitization. Treat as informational, not diagnostic." The dialogue audit still fires if texture overall is low. But it no longer escalates based on fillers alone.

The post-fix scores:

Sample Score Verdict
Gatsby (Fitzgerald, 1925)0.0 / 100Within human range
Red Badge (Crane, 1895)0.0 / 100Within human range
Sun Also Rises (Hemingway, 1926)7.0 / 100Light signals
Winesburg, Ohio (Anderson, 1919)7.0 / 100Light signals
AI thriller (Claude, cold)23.0 / 100Elevated signals

Hemingway moved from SIGNIFICANT SIGNALS to LIGHT SIGNALS. The AI thriller stayed at the same score — the discrimination didn't break, it improved.

What this means for the product

Three things, in order of importance:

  1. Calibration on contemporary AI is necessary but not sufficient. Every audit needs at least one historical baseline (1900–1950 era literary prose) to catch rules that work on modern text but fail on different stylistic conventions.
  2. Zero-evidence escalations are dangerous. A signal should never single-handedly flip a verdict. It should adjust a score that's already weighted by multiple inputs.
  3. Hemingway is the right canonical test. If your literary AI detector flags Hemingway, fix the detector. Don't argue with Hemingway.

Try it yourself

The exact text we used is on Project Gutenberg (ebook #67138). Download it and run it through Slopsleuth for free — no signup required. The Hemingway sample is also pre-loaded in the app as a calibration button.

If you find a published novel that scores above 15/100, please tell us. We'll add it to our calibration set.

Run a sample audit yourself

The Hemingway sample is pre-loaded in Slopsleuth. One click, full report, no signup needed.

Launch Slopsleuth →