Why we removed the score

The score never deserved your trust anyway

Run the same paragraph through GPTZero, Originality, and Pangram. You will get three different numbers. We’ve seen the same text scored 50, 79, and 20 across three tools that all call themselves accurate. None of them are measuring the same underlying quantity. Each one made up its own formula. The numbers can’t be reconciled because they aren’t comparable.

WROITER had the same problem. Our 98 wasn’t a measurement of anything that exists in the world. It was the output of a scoring formula we wrote — severity weights, isolation adjustments, co-occurrence bonuses, all decisions made by us, all defensible only by appeal to themselves. The underlying detection was real. The number on top of it was opinion.

The detector level is where WROITER is strong. The flags are verifiable — you can read the text and check whether each flagged thing is actually there. The score level is where WROITER inherits the same weakness as every other vendor on the market. So we removed it.

What replaces it

The diagnostic now leads with what we call a profile. It looks like this:

14 instances of AI slop across 4 of 6 signal families.

Lexical    ██████ 3 found
Meta       ██ 1 found
Structure ██████████ 8 found
Rhetoric   ····· none
Persona    ····· none
Style      ██ 2 found

You get a count, six family rows, and a footer line that names how many families lit up. Underneath, the same per-flag detail you had before — what was found, where, how often, what each pattern means.

What you don’t get is a single number pretending to be a verdict.

What you lose

Sortability. If you were triaging 30 essays in a queue, a 0–100 score let you sort the list and start at the top. The profile doesn’t do that — not as one number, anyway.

But the count summary at the top of the profile (“14 instances across 4 families”) gives you a triage signal. You can sort by instance count if you want a fast queue. The difference is that “14 instances” is a count of facts — you can look at the text and check it. A 0–100 score is a count of opinion that nobody can audit.

What you gain

You gain the part of the diagnostic that always was defensible: every flag points at specific text, names the pattern in plain language, and says why it matters. A writer can read it. An editor can act on it. The findings stand on their own without a number to apologize for.

You also gain the ability to compare across tools honestly. The next time another detector tells you your text is 87% AI, you can ask: which specific patterns did it find, and where are they? If it can’t answer, the 87% wasn’t a measurement. It was a vibe in a lab coat.

What hasn’t changed

The detector engine catches the same patterns it always did. The 28 detectors, the calibration corpus, the specification, the engine audit — all unchanged. What went away is the lossy compression layer that turned the findings into a number. The audit stays. The number goes.

The broader pattern

Most of the AI detector industry is running a confidence game. The accuracy claims don’t survive independent benchmarking. Originality catches 31.7% of GPT-5 content per Fritz.ai 2026. Vanderbilt and Curtin both disabled Turnitin’s AI detection after the false positive rates ruined real students’ lives. None of these vendors will stop showing you a number, because the number is the product they sell.

The score isn’t broken because we did it wrong. It’s broken because the entire format is unfalsifiable. We can’t fix that by having a more careful number. We can only fix it by leaving the score game and shipping what was actually working underneath.

That’s what this change is. Findings, not verdicts. We tell you what’s there. What you do about it is your call.

See it for yourself

The change is live on the diagnostic. Run anything — an old draft you wrote, a piece you suspect of being AI-generated, this article itself. You’ll get a profile. If you want the old score back during the deprecation window, it’s still in a collapsed details panel below the profile. Expand it once and you’ll see why we moved it.

For the engineering side: the engine audit documents what we found when we audited our own detector last week. The calibration log publishes the corpus we tune against. The open specification describes every detector and trigger condition. None of that changes — it just no longer has a number stapled to the front of it.

Run the diagnostic Read the method