Method Changelog | WROITER Detection Updates

Standing interpretation rule

Across all versions: the score measures overlap with AI-typical surface patterns. It does not prove authorship. That boundary does not change between releases. See How It Works for score ranges and the full interpretation framework.

Pattern Profile Specification v2.0 — calibration harness section — 2026-05-27

What changed: New §5 Calibration Harness in the canonical spec documents how the detector thresholds were tuned. The section covers the four properties confirmed structurally sound by audit Pass 2 finding fr009: deterministic FNV-1a hashed sample ranking, per-group stratification (every source group in train and holdout proportional to its size), 25% holdout allocation with small-group rules (0 holdout below 3 samples, 1 below 4, then round(N×0.25) clamped above), and layered quality filters (English long-form check, Gutenberg lead-matter stripping, length-windowed paragraph trimming). Also documents the corpus sources by name (Project Gutenberg literature, Wikinews, HuggingFace Reddit TIFU for human; ShareGPT-GPT4 / Gryphe-Claude-3.5 / WildChat-Gemini-2.0-Flash for AI), the three output artifacts in shared/diagnostic/calibration/, and the network dependency / cold-start cost (audit fr008). Existing sections renumbered: Output Format §5 → §6, Known Failure Modes §6 → §7, What the Profile Does Not Establish §7 → §8, Versioning §8 → §9. TOC updated.

Why it matters: Calibration was the v032 follow-on writing project the audit deferred. The harness has been in the repo since v1.3.x but wasn’t formally part of the canonical spec, which meant the audit had nothing to verify the calibration code against on Pass 1. v2.0 now documents the harness; future audit passes can verify properties (determinism, stratification, holdout sizing, quality filters) against this section. The section is tight (one page) rather than full reproduction instructions, on the principle that the canonical artifact lives in code — the spec describes the contract, not the implementation. This closes the substantive writing scope of workstream E. The detector inventory expansion (the previous changelog entry) and this calibration section together complete the v2.0 spec rewrite as planned in the post-audit roadmap.

Pattern Profile Specification v2.0 — detector inventory expansion — 2026-05-27

What changed: Eleven detector entries in /method/spec/ are expanded from their v1.0 example sets to enumerate the actual pattern coverage of the engine. Each was found by the v1.3.4 audit to be a strict superset of the published spec — the engine had grown through v1.1, v1.2, and v1.3 detector refinements, and the spec text had been treating each pattern class as illustrative. v2.0 inverts that: every entry now reads as "including:" with the full set, drawn directly from the regex in diagnostic-core.mjs. Detectors with expanded entries: BANNED_PHRASES (27 → 30 patterns), HEDGING (8 → 13 groups), META_INTRO (broader article/verb sets + 40-char prefix tolerance documented), META_OUTRO (11 stock markers, including "In closing," "To summarize," "To conclude," "The main takeaway is"), ESSAY_THESIS_ANNOUNCEMENT (9 announcement verbs, 5 document types), MICRO_SUMMARY (split into three pattern classes; added "All in all" opener and broader takeaway/shows verbs), ANSWER_SCAFFOLDING (full internal-cue list documented; "let me break this down" moved to ASSISTANT_PERSONA; count cap at 3 documented as intentional), COLON_LIST (two-item lists and or connector documented), PIVOT_CRUTCH (only/merely/about modifiers documented; cross-family overlap with BANNED_PHRASES documented as intentional), WEASEL_ATTRIBUTION (six pattern classes → nine), FALSE_BALANCE (three patterns → five), ASSISTANT_PERSONA (eight patterns → 20-plus, including "let's unpack/explore this," "let me explain/walk you through"), AI_DISCLAIMER (six patterns → seven; full "I'm/I am [just/only] [an] AI/artificial intelligence/language model/chatbot/virtual assistant" class added). RULE_OF_THREE gets a note that the 1–2 word per-item cap is intentional false-positive suppression. COPULA_AVOIDANCE gets a note that \s+ matching catches line-wrapped occurrences. New §4.7 Implementation notes documents engine details: case-sensitivity intent on three detectors, underscore-prefixed flag-internal properties, shared splitter utilities, English-only scope.

Why it matters: The 11 audit fix_in_spec items are now folded into the canonical spec. The doc no longer under-describes the engine. Anyone implementing the Pattern Profile Specification independently has the full pattern set, the cross-family overlap rationale, and the intentional-constraint rationale (case-sensitivity, count cap, 1–2 word triad cap). Calibration harness section is still a follow-on writing project; this PR is the detector-inventory half.

Pattern Profile Specification v2.0 — rename + score removal — 2026-05-27

What changed: The canonical specification document at /method/spec/ is renamed from "Slop Score Specification" to "Pattern Profile Specification" and bumped to v2.0. The scoring algorithm section (v1.0 §5) and interpretation bands (v1.0 §5.6) are removed entirely. The output format section now documents slopScore as a deprecated field, alongside the previously-undocumented metrics and disclaimer fields the engine has been emitting since v1.3.x. The detector inventory header count is corrected from 22 to 28 (the v1.0 doc enumerated all 28 detectors but the introductory text read 22). The definitions, input requirements, failure modes, and "What the Profile Does Not Establish" sections are reframed around the profile rather than the score. The v1.0 spec is archived at /method/spec/v1.0/ for citation stability and marked noindex.

Why it matters: The spec is the canonical engineering reference. After the v1.4.0 hard-launch removed the score from the UI, the spec doc still called itself "Slop Score Specification" and devoted §5 to a scoring algorithm the product no longer surfaces. That mismatch was awkward and load-bearing — downstream tools (the revision engine, planned MCP server, planned browser extension) all consume this spec. v2.0 makes the spec match the live product. The 11 audit fix_in_spec items (detector-coverage enumerations) and the calibration harness section are not in this rewrite — those land in a follow-on spec PR. This is the rename + score-removal half; the substantive detector inventory work comes next.

v1.6.5 — 2026-06-02

What changed: A regex-hardening fix, with no change to what the engine detects. Two detector patterns — SECTION_LABEL_SCAFFOLDING and the list-format prompt matcher — had a leading whitespace class that also matched newlines. On a pathological input of tens of thousands of blank lines with no real text, the matcher re-scanned the whole document from every line break (O(n²)): a ~92k-character input took about 9.6 seconds of CPU instead of milliseconds. The leading class is now restricted to horizontal whitespace, so the scan is linear again. Every real section label and list still flags exactly as before.

Why it matters: the diagnostic runs synchronously behind the public /revise API. A single crafted request under the size limit could tie up far more compute than a normal document, and an input-length cap can't prevent that on its own. This closes it — purely a speed-of-rejection fix; detection and scoring are unchanged.

v1.6.4 — 2026-06-02

What changed: The minimum input length drops from 50 words to 15. Short copy — SEO meta descriptions (120–155 chars / ~20–30 words), titles, ad lines — can now be checked instead of bouncing off a 400. Samples of 15–99 words get partial coverage: the density and structure detectors (rhythm, uniformity, repetition, triads) need ~100+ words to fire, so they stay quiet on short text and a short-sample profile reflects the lexical and phrase-level detectors only.

Why it matters: the floor existed largely so the (now-removed) 0–100 score had enough signal to be stable. With the profile-of-flags output there is no number left to go degenerate on a short sample. And the density detectors self-gate on short text through their raw-count thresholds — you cannot hit ≥4 transitions or ≥4 paragraphs in 25 words — so they do not misfire; the lexical and persona detectors (banned phrases, AI-scented words, assistant persona, self-congratulation) are exactly what you want to catch in a meta description, and their count thresholds stop one stray word from false-flagging. Verified end to end: a clean 29-word meta description returns nothing, a slop-laden one flags the phrases, and a sub-15-word title still rejects.

v1.6.3 — 2026-06-02

What changed: New detector SELF_CONGRATULATION (persona family, severity medium). It flags praise pointed at the product or its maker instead of value delivered to the reader — naming a value as a Principle (“the transparency principle”), superiority-by-strawman (“just another opinion engine”), self-applied virtue adjectives (“an honest account”), deserve/owe framing (“readers deserve”). A we/you sentence-opener skew is also tracked but only counts alongside a lexical tell. Also in this release: confidence for HEDGING and EMPTY_INTENSIFIERS is now instance-aware rather than a per-detector constant — HEDGING emits medium by default and low for the vaguest soft qualifiers (“to some extent,” “in some ways”), and EMPTY_INTENSIFIERS drops from high to medium. Spec doc bumped to v2.2 (/method/spec/); inventory header count 32 → 33.

Why it matters: Self-congratulation is sycophancy redirected — the same people-pleasing impulse the engine already chases (CREDIBILITY_THEATER, ASSISTANT_PERSONA), but pointed at the product instead of the user. The subject-ratio is gated behind a lexical tell because first-person memoirs are legitimately we-heavy; calibrated to zero false positives on a 168-document human-prose corpus. The confidence change fixes a real calibration gap: a context-dependent hedge like “to some extent” used to return high confidence identical to an unambiguous template phrase. Confidence now reflects certainty about the specific match, which is what consumers triaging revisions actually need.

v1.6.2 — 2026-06-02

What changed: New detector ABSOLUTISM_OVERUSE (lexical family, severity low). It flags repetition of a single universal quantifier or totality word (every, each, all, always, never, none, completely, fully, entirely, totally). Repetition-based, not presence-based: it fires when one such word occurs ≥8 times with a per-word density floor (>2 per 1000 words) that spares very long documents. Inventory header count 31 → 32.

Why it matters: The obvious design — flag when total absolute density crosses a threshold — does not work. Measured against a 168-document human-prose corpus, good writing runs denser in absolutes than the AI document that motivated the detector (up to 19 per 1000 words in ordinary personal essays, against 9.5 in the slop), so a density gate would flag good prose first. The separator that holds is single-word repetition: no document in the corpus repeats one absolute more than 7 times. Calibrated to zero false positives on that corpus.

v1.6.1 — 2026-06-01

What changed: Lexical gap-closure batch from live roast testing. BANNED_PHRASES gains template-grade hype and empty closing flourishes that fire on first hit: “paradigm shift,” “game-changer / game-changing,” “redefine the (very) fabric of,” “results speak for themselves,” “the rest is history,” “only time will tell” (30 → 37 patterns). EMPTY_INTENSIFIERS gains staggering, tirelessly, seamlessly, effortlessly, relentlessly as density-only signals.

Why it matters: These were verified misses — phrases a live diagnostic let through that a reader flagged immediately. The hype phrases are near-always slop, so they fire on the first occurrence. The vibe words ride the existing intensifier density gate instead: one “staggering” stays clean, a pile-up fires. The split keeps the additions from over-flagging good prose that uses one of these words legitimately.

v1.6.0 — 2026-05-28

What changed: Phase 3 of the revision engine closes. The 16 medium-tier detectors from Pattern Profile Specification §4.2 now have revision generators, bringing the revision engine to 28 of 31 total inventory detectors. The three remaining detectors (UNIFORM_RHYTHM, PARA_UNIFORMITY, SEGMENTED_EXPOSITORY_BLOCKS) are the §4.3 hard tier — they need LLM enrichment to produce useful per-instance revisions and ship in a later phase.

The 16 detectors landing in this release, grouped by family:

Lexical (5): BANNED_WORDS, BANNED_PHRASES, HEDGING, EMPTY_INTENSIFIERS, TRANSITION_OVERUSE
Meta (2): OVER_SIGNPOST, MICRO_SUMMARY
Structure (7): OPENER_REPETITION, PASSIVE_OVERUSE, COLON_LIST, LABELED_LIST_FORMAT, OUTLINE_LIST_FORMAT, SUBORDINATE_REPETITION, RULE_OF_THREE
Rhetoric (2): WEASEL_ATTRIBUTION, FALSE_BALANCE

Most ship at confidence: high — the instruction is either a pure deletion or a formulaic substitution the AI applying it can do without surrounding context. Three are confidence: medium per spec §6, where the instruction is mechanical but the replacement requires reading context: BANNED_WORDS (picking a concrete substitute), PASSIVE_OVERUSE (identifying the agent), RULE_OF_THREE (deciding which triads to break and how). OUTLINE_LIST_FORMAT is the only document-level revision in the inventory — it anchors to the whole text and ships with the spec §4.2 caveat against stripping substantive multi-paragraph lists.

Why it matters: Phase 2 (engine 1.4.0) shipped the easy mechanical detectors — nine modules that just delete. Phase 3 covers everything that needs an anchor list. Together with the v1.5.0 trio from the anti-AI-writing pack, the revision engine now produces actionable instructions for every §4.1 and §4.2 entry. A user pasting a flagged article into Claude or ChatGPT with the revision prompt now gets per-instance edits for 28 distinct patterns. The next surface bump is the hard tier — the uniformity detectors — which can't be addressed mechanically and need a different shape of generator.

Internal: PHASE_3_DETECTORS in site/assets/js/revision-engine/detectors/index.mjs reaches 16. registerAllBuiltinDetectors() registers 28 (9 phase 2 + 3 V150 + 16 phase 3). Each generator ships with positive-trigger fixtures and a missing-underscore-prop short-circuit assertion. No detection-engine behavior change — phase 3 is purely additive on the revision-engine side.

v1.5.2 — 2026-05-28

What changed: Two detector emissions now attach the underscore-prefixed match record the revision engine reads. EMPTY_INTENSIFIERS flags carry _intensifiers — the unique lowercased adverbs that fired the rule. OPENER_REPETITION flags carry _repeatedOpeners — up to five {opener, count} entries describing the repeated sentence-opening patterns. No behavior change to detection itself; both detectors fire on exactly the same input as before and report the same count, severity, and detail.

Why it matters: The phase 3 revision generators land next and they need per-instance anchors to write useful revision instructions. Without these props the generators would have had to parse the human-readable detail string, which is fragile. This is the small engine-side surface change that unblocks the phase 3 revision-engine batch. Spec entries for both detectors get a flag fields line in their respective §4 entries when phase 3 ships.

v1.5.0 — 2026-05-28

What changed: Three new detectors added from the anti-AI-writing working notes — rules distilled from months of editing AI-drafted articles into publishable prose. The detectors target patterns that the existing inventory didn’t catch:

CREDIBILITY_THEATER (persona family, severity high, direct-leak pattern). Catches permission-asking framings and credibility-asserting labels — “let me be direct,” “real talk,” “the honest truth,” “I’ll be straight with you,” “to be candid,” “if I’m being honest,” “let’s get real.” Real honest writing doesn’t announce itself; the label only advertises the gap.
TELEGRAPHED_REVEAL (meta family, severity medium, fragile pattern). Catches label-as-frame constructions and pre-digestion prefaces — “The distinction:” “The takeaway:” “Here’s the thing,” “Here’s what no one’s saying,” “this matters because,” “the contradiction is.” Pre-chewing the insight patronizes the reader; the facts should carry meaning on their own.
MANUFACTURED_DRAMA (rhetoric family, severity medium, fragile pattern). Catches discovery/burial framing verbs — “buried in the changelog,” “quietly slipped,” “snuck in,” “hidden in the docs,” “under the radar.” Staging a routine fact as a dramatic reveal signals that the writer doesn’t trust the underlying material.

Plus a PIVOT_CRUTCH expansion: a third alternation catches the escalation form — “[X] doesn’t just Y, it Z” — where an abstract subject’s lesser consequence is followed by its bigger one. v1.4.1 caught the bare-pivot form (period-separated); v1.5.0 catches the connected-with-comma form. The regex requires the second-clause pronoun to be it / they / that / this, so legitimate parallel constructions about specific people (“she doesn’t just walk, she runs”) intentionally don’t fire.

Spec doc bumped to v2.1 (/method/spec/). Detector inventory header count updated 28 → 31. Each new detector documented in its respective family section. PIVOT_CRUTCH entry expanded from two structural forms to three. Revision engine gets three new template generators in site/assets/js/revision-engine/detectors/, registered via the new registerAllV150Detectors() helper (or the combined registerAllBuiltinDetectors()).

Why it matters: The existing 28-detector inventory was built with strong intuitions about lexical and structural AI tells. The anti-AI-writing notes — built from real editorial work on real drafts — surface a complementary set of patterns about writer’s posture: credibility theater, pre-digested insight, manufactured drama, and escalation rhetoric. These are the moves an AI reaches for when it wants to sound substantive without committing to substance. Each new detector has a high-precision regex (probed against 46 cases, 100% pass) and a fragile-or-direct-leak classification that matches its signal specificity. The full inventory is now 31 detectors across 6 signal families.

Revision-prompt anchors: the revision-prompt builder is also fixed in this release — previously, four detectors (BANNED_PHRASES, BANNED_WORDS, OVER_SIGNPOST, LABELED_LIST_FORMAT) referenced their paragraphs as bare “Paragraph 26” without an opener snippet. LLMs can’t reliably count paragraphs in long text and humans don’t count them at all, so the number was theater. The fix drops paragraph numbers from all references and uses the freed character budget to take an 8-word opener phrase — long enough to be unique in any real article, short enough not to bloat the prompt. Every revision-prompt line now anchors via a string the LLM can search for directly.

v1.4.1 — 2026-05-27

What changed: PIVOT_CRUTCH regex extended to catch the bare-pivot form — “This/It isn’t [X]. It’s/It is/This is [Y]” — where the rhetorical inversion is split across two sentences instead of joined by but. v1.4.0 only matched the single-sentence connected form (“not just X but Y”). The detector now covers both shapes. Spec entry at PIVOT_CRUTCH in §4.4 updated to describe the two structural forms. New regression fixture in diagnostic-core.test.mjs verifies the bare-pivot trigger.

Why it matters: The gap surfaced during voice review on the score-removal blog post: a draft paragraph used “This isn’t WROITER getting weaker. It’s WROITER getting more honest…”, the editorial reader flagged it as AI-shaped, and the engine missed it because the inversion was split by a period. The bare-pivot form is the more common written variant; the connected “X but Y” form is the more common spoken variant. v1.4.1 closes the gap on the more common written shape.

v1.4.0 — 2026-05-27

What changed: The 0–100 slop score is removed from the diagnostic UI. The profile of detected patterns (instance count, six-family breakdown, severity counts) is now the canonical output. The JSON API still emits slopScore as a deprecated field during the transition window — its schema entry is marked "deprecated": true. The field will be removed in a future major release.

Why it matters: The flag-level detection is auditable — you can read the text and check whether each flagged pattern is actually there. The score-level number was opinion stacked on top of measurement: severity weights, isolation adjustments, co-occurrence bonuses, all decisions made by us and defensible only by appeal to themselves. The score was also undefendable against the cross-vendor reality: GPTZero, Originality, and Pangram score the same text 50, 79, and 20 respectively, all calling themselves accurate. We were running the same play with our own internal weights. Removing the score takes out the layer that wasn't carrying its weight. The full reasoning is in Why we removed the score.

v1.3.6 — 2026-05-27

What changed: Two follow-on fixes from the queued items in the v1.3.4 engine audit. PASSIVE_OVERUSE irregular participle list expanded from 26 to ~58 entries — the regex now catches broken, eaten, fallen, frozen, stolen, ridden, drunk, hidden, mistaken, and most other common irregular participles that were silently missed before. splitSentences is now abbreviation-aware: text like “Dr. Smith left. He returned.” correctly parses as two sentences instead of dropping the “Dr.” fragment. The abbreviation list covers titles, degrees, country codes, Latin abbreviations, and common citation forms.

Why it matters: The participle expansion closes a false-negative gap on real passive constructions in everyday prose. The abbreviation fix corrects sentence-count and length statistics that every sentence-based detector consumes — UNIFORM_RHYTHM, OPENER_REPETITION, PASSIVE_OVERUSE, SUBORDINATE_REPETITION, and RULE_OF_THREE all benefit. Both fixes ship with new regression fixtures. Full audit deliverables: audit/v1.3.4/ on GitHub.

v1.3.5 — 2026-05-27

What changed: Three regex bug fixes surfaced by the v1.3.4 engine audit. SECTION_LABEL_SCAFFOLDING regex is now anchored to paragraph starts, so mid-sentence usages like “they reached a similar conclusion: patients...” no longer fire as false positives. OUTLINE_LIST_FORMAT and LABELED_LIST_FORMAT bullet detectors now match Unicode bullet characters (•, ●, ◦, ∙, ·, en-dash, em-dash) in addition to ASCII * and -, so paste sources like Word, Google Docs, Slack, and the ChatGPT/Claude web UIs are detected correctly. COPULA_AVOIDANCE now allows line breaks between the verb and “as” via \s+ instead of a literal space.

Why it matters: These are the three real code bugs the audit identified across 28 detectors. Each fix verified by a dedicated regression fixture: the SECTION_LABEL fix passes 11 of 11 mixed true-positive and false-positive cases, the Unicode bullet fix detects 25 Unicode bullets in a sample that previously reported zero, and the COPULA fix matches line-wrapped “serves\nas” constructions. Full audit deliverables: audit/v1.3.4/ on GitHub.

v1.3.3 — 2026-04-07

What changed: Added SECTION_LABEL_SCAFFOLDING, a cautious detector for assignment-style inline headings such as Introduction:, Body:, Conclusion:, and Paragraph 1:. The engine now tracks 28 detectors instead of 27.

Why it matters: This pass improved the essay stress suite again without disturbing the main guardrails. Internal holdout metrics stayed unchanged at 100.0% precision, 77.8% recall, and 0.0% human false-positive rate. Good AI capture stayed at 89.3%, but the average benchmark score rose from 66.6 to 67.9 because section-labeled essays are now explained more explicitly. On Ghostbuster, recall moved from 91.1% to 91.7% while the essay-domain human false-positive rate stayed flat at 59.8%. One human essay matched the new rule, but only at score 4, so it did not create a new threshold-crossing false positive.

v1.3.2 — 2026-04-07

What changed: Added SEGMENTED_EXPOSITORY_BLOCKS, a low-severity structural detector for essays built from three or more medium-length exposition paragraphs that read like pre-segmented assignment blocks. The engine now tracks 27 detectors instead of 26.

Why it matters: This pass was accepted only because it cleared every guardrail that blocked the tempting META_OUTRO shortcut. Internal holdout metrics stayed unchanged at 100.0% precision, 77.8% recall, and 0.0% human false-positive rate. Good AI capture moved from 83.9% to 89.3%, and Ghostbuster essay recall moved from 87.2% to 91.1% while the essay-domain human false-positive rate stayed flat at 59.8%. The blunt isolated-META_OUTRO uplift is still rejected because it would push Ghostbuster human false positives to 64.4%.

v1.3.1 — 2026-04-07

What changed: Added ESSAY_THESIS_ANNOUNCEMENT, a narrow detector for the rigid Introduction: ... This essay will ... school-essay template. The engine now tracks 26 detectors instead of 25.

Why it matters: This was the first post-benchmark tuning pass accepted after the Good AI and Ghostbuster challenge sets were frozen. It leaves the internal holdout unchanged at 100.0% precision, 77.8% recall, and 0.0% human false-positive rate. On the Ghostbuster essay stress test it moves recall from 83.7% to 87.2% while shifting the essay-domain human false-positive rate only from 59.5% to 59.8%. Good AI capture remains 83.9%, which is exactly the point: the patch targets a specific external seam without pretending to solve everything.

Benchmark Freeze — 2026-04-07

What changed: No engine change. Added two frozen external benchmark harnesses: a full pull of The Good AI examples library and a matched essay-domain stress test from polsci/ghostbuster-essay-cleaned. Added baseline snapshot support so future tuning can be compared against fixed benchmark files instead of the most recent rerun.

Why it matters: The Good AI benchmark exposed a tempting seam: 22 misses sitting at score 7 with only META_OUTRO. The Ghostbuster stress test showed why we did not overfit that seam. In that essay domain, threshold 8 already carries a 59.5% human false-positive rate, and promoting isolated META_OUTRO would push it to 64.1%. The benchmark update made the tuning decision stricter without changing production behavior. Public case study: We Ran 168 Good AI Examples Through WROITER.

v1.3.0 — 2026-04-06

What changed: Added deterministic train/holdout calibration to the test harness and published the Calibration Log. Added three new detectors aimed at clean assistant prose: ANSWER_SCAFFOLDING, OUTLINE_LIST_FORMAT, and LABELED_LIST_FORMAT. The engine now tracks 25 detectors instead of 22.

Why it matters: Clean assistant answers were the main source of false negatives after the first calibration pass. On the current holdout slice, the production threshold of 8 now evaluates at 100.0% precision, 77.8% recall, and 0.0% human false-positive rate. We kept the public medium threshold at 8 because it matched the train-tuned threshold on holdout while remaining the more conservative cutoff.

v1.2.1 — 2026-04-05

What changed: First data-driven calibration pass over a 76-text corpus spanning literature, essays, journalism, Reddit, GPT-4, Claude, and Gemini. Softened BANNED_WORDS, tightened OPENER_REPETITION, tightened and made UNIFORM_RHYTHM dialogue-aware, tightened PASSIVE_OVERUSE, and added co-occurrence-aware scoring so isolated weak flags carry less weight.

Why it matters: The old default threshold of 45 had extremely low recall. After this pass, the operating threshold moved to 8 and the full-corpus snapshot improved to 91.7% precision, 30.6% recall, and a 2.5% human false-positive rate. That was the version where the method stopped treating single isolated flags as decisive evidence.

v1.1.0 — 2026-04-04

What changed: Published the method hub with dedicated pages for scoring logic, pattern definitions with examples, and limitations. Added score-range calibration guidelines (0–15 / 16–40 / 41–70 / 71–100). Standardized the public framing: the diagnostic is a quality signal, not an authorship detector.

Why it matters: Reviewers now have a documented path from the score to the explanation behind it. Teams building internal review processes can point to the method pages instead of guessing at what the tool does.

v1.0.0 — 2026-04-03

What changed: Initial public release. Client-side diagnostic with five pattern families (vocabulary, phrases, intros/conclusions, rhythm, signposting), rhythm metrics (word count, sentence count, average length, standard deviation, burstiness), and flag-level output (patternId, label, severity, count, detectorNote). Score range 0–100. Minimum sample size: 50 words.

Why it matters: Established the public output contract. Evidence pages on detector reliability and false positives shipped alongside the tool so the diagnostic launched with its limitations visible from day one.

When to check this page

Come back here when comparing historical runs against current ones, updating an internal review policy, or checking whether the interpretation guidance changed since you last looked. For the full method, go to the method hub.

Run Diagnostic Review limitations