Published 2026-04-04

How do AI detectors work?

Most AI detectors estimate style likelihood, not origin certainty. They measure patterns in word choice, sentence rhythm, and structural habits, then compress those signals into a score that looks more definitive than it actually is. This page explains the mechanisms, the limits, and why two tools can give you completely different answers on the same text.

Run Diagnostic See WROITER method

Short answer

Detectors measure textual features — word frequency, sentence rhythm, structural templates, phrasing patterns — compare them against trained expectations for human and AI writing, and produce a score or label. The quality of that result depends on four things: what the detector was trained on, which features it prioritizes, where it sets its decision threshold, and what genre of text you feed it. Change any one of those and the result changes too.

The three main approaches

Not all detectors work the same way. Most fall into one of three categories, and some combine them.

Statistical detection (perplexity and burstiness)

This is the approach you will see mentioned most in research. The idea: language models generate text by predicting the next most likely token. That means AI output tends to be low-perplexity — each word is unsurprising given the words before it. Human writing, by contrast, tends to be more variable. We make unexpected word choices, change rhythm mid-paragraph, and introduce ideas that a prediction model would not have ranked as likely.

Perplexity measures how surprising the text is to a language model. Low perplexity means the text closely follows what the model would have predicted. High perplexity means the text contains choices the model would not have made.

Burstiness measures variation in that surprise across the passage. Human writing tends to be "bursty" — some sentences are highly predictable, others are surprising. AI writing tends to stay at a steady, low level of surprise throughout.

The problem: formal writing, edited prose, and second-language text also tend to be low-perplexity and low-burstiness — because those genres compress variation by design. This is one of the main sources of false positives.

Classifier-based detection

Train a model on a large dataset of confirmed human text and confirmed AI text. Feed it a new sample. Ask it which category the sample resembles more. This is a standard binary classification problem, and it works well when the test text looks like the training data. It degrades when the text comes from a model, genre, or style the classifier was not trained on.

Most commercial detectors (GPTZero, ZeroGPT, Originality.ai) use some version of this approach, often combined with statistical features.

Pattern-matching detection

Instead of asking "does this text look like AI in general?", pattern-matching systems ask "does this text contain specific structures that are overrepresented in AI output?" — stock phrases, throat-clearing intros, formulaic conclusions, metronomic rhythm, over-signposting.

This is the approach WROITER uses. It trades breadth for transparency: you can see exactly which patterns triggered the score and verify them in the text. The trade-off is that it catches common AI patterns rather than novel ones.

What about watermarking?

Some AI providers embed statistical watermarks in their output — subtle patterns in word choice that are invisible to readers but detectable by a verification tool with the right key. OpenAI has discussed this; Google has shipped versions of it for images.

Watermarking is fundamentally different from detection. It requires cooperation from the AI provider at generation time, and it only works on text from that specific provider. It is not a general-purpose detector. It also breaks when the text is paraphrased, edited, or translated. For now, watermarking is a promising research direction but not a practical solution for most real-world review workflows.

Why two detectors disagree

This is not a bug. It is what happens when different tools make different bets.

Imagine a 400-word passage with flat rhythm but varied vocabulary, no stock phrases, and a few structural quirks. A detector that weighs rhythm heavily might score it 75. A detector that weighs phrase templates heavily might score it 20. Both are "correct" given their own feature priorities — they are just measuring different things.

Common reasons for disagreement:

Different training data. A detector trained mostly on ChatGPT output may not recognize Claude-style prose, and vice versa.
Different feature emphasis. Rhythm-heavy detectors and vocabulary-heavy detectors will disagree on any text where those two signals point in different directions.
Different thresholds. One tool calls anything above 50 "likely AI." Another sets the bar at 70. Same underlying score, different label.
Different update cycles. Models change. A detector trained on GPT-3.5 output may not catch GPT-4o patterns, and a detector updated last month may not catch a model released this week.

The practical implication: if you are running text through multiple detectors and getting different answers, the disagreement itself is information. It means the signal is ambiguous. Treat it that way.

Why essays and edited prose trigger false positives

The features detectors use to catch AI writing — low perplexity, flat rhythm, constrained vocabulary, structural regularity — are the same features that appear naturally in certain genres of human writing.

Academic essays use formal structure, hedged claims, and conservative vocabulary. Those conventions overlap with AI output conventions.
Heavily edited text has had its quirks smoothed out. The editing process removes the variation that detectors use to identify human writing.
Second-language writing tends toward safe, common phrasings — the same phrasings AI models default to.
Legal and institutional prose is repetitive and formulaic by design. Detectors trained on informal text will over-flag it.

The False Positive Hall of Fame documents real cases. The limitations page breaks down which genres are most at risk and how to adjust your interpretation.

Interpretation

A higher score means stronger overlap with the features a detector associates with AI writing. It does not prove authorship, identify a writer, or settle a misconduct claim. For WROITER's score ranges and the full interpretation boundary, see How It Works. For safe review policy, see Limitations.

The responsible use rule

Treat detector scores as triage, not conclusions. Confirm them with pattern-level evidence, revision history, and genre context before acting. The higher the stakes, the less acceptable a detector-only workflow becomes. The reliability brief in Do AI Detectors Work? explains why.

If you want to run a check now with the patterns visible:

Run Diagnostic Inspect patterns