What makes AI writing detectable
Every AI model writes with a tell. Not one tell — dozens. This report maps vocabulary, structure, and rhetorical habits across eleven models. It’s the foundation everything else stands on.
How big is the gap between AI and human text
Everyone says AI writing “sounds different.” We measured it. Word-level frequency ratios, sentence-length distributions, lexical diversity, and rhetorical pattern rates — across almost three million words of paired human and AI text.
Turning observations into detection rules
Fifty named, implementable detection patterns organized into nine categories. Each with a surface description, a mechanical cause, and a programmatic detectability sketch. No neural network required.
What existing detection resources miss
Before building a pattern library, we looked at every public resource that catalogs AI writing patterns. Wikipedia, GPTZero, Pangram, academic papers, GitHub, Reddit. Here’s what they cover, where they stop, and what nobody has built yet.
Testing against an outside dataset
Most AI detectors publish accuracy numbers from their own test sets. We grabbed someone else’s AI writing library and scored every sample with the production detector. Category-level results, miss analysis, and a tuning decision validated against a second corpus.
The hardest corpus we could find
4,858 student essays — 694 human, 4,164 AI — scored with the production detector. The dataset that stopped us from cheating on easier benchmarks and forced us to publish the hardest number in the project: a 59.8% human false-positive rate on student writing.
Does the revision prompt actually work
A frozen 32-text Polygraf slice that measures the prompt, not just the detector. Fresh AI drafts in, WROITER revision prompts out, revised drafts frozen and rescored. It shows both the win and the limitation: when WROITER speaks, the edits work; many clean drafts still produce no prompt at all.
How research becomes a detector
Research identifies the patterns. The pattern library catalogs and explains them. The calibration log documents how each was tested against real human writing. The method ties it together. And the evidence pages show you where it fails.
If you came here to check whether the diagnostic can be trusted, that last link is the one that matters most.