1. Introduction
This document specifies the Slop Diagnostic—a heuristic method for detecting surface-level patterns associated with AI-generated text. The specification defines the input requirements, detector inventory, scoring algorithm, output format, and known failure modes for version 1.0 of the method.
The diagnostic does not determine authorship. It measures overlap between a text sample and a documented set of structural, lexical, rhythmic, and rhetorical patterns that appear disproportionately in large-language-model output.
WROITER is the reference implementation of this specification. The specification is published under CC BY 4.0—anyone may implement, fork, or build on this method with attribution.
2. Definitions
| Term | Definition |
|---|---|
| Sample | The text submitted for analysis (minimum 50 words). |
| Flag | A single triggered detector, with a severity level, a count, and a human-readable detail string. |
| Score | A normalized integer 0–100 representing aggregate pattern density. |
| Signal family | A category grouping detectors by the type of evidence they collect. |
| Severity | One of high, medium, or low, assigned per detector based on signal specificity. |
| Fragile pattern | A detector whose signal is unreliable in isolation; subject to score dampening when no corroborating detectors fire. |
| Direct leak pattern | A detector whose signal is highly specific to AI output and triggers a score bonus independent of co-occurrence. |
3. Input Requirements
- Type: plain text string
- Minimum length: 50 words
- No preprocessing required; the diagnostic handles sentence splitting and normalization internally
- Optimal sample length for stable results: 150 words or more
- Score reliability degrades for samples under 100 words
4. Detector Inventory
Twenty-two detectors are defined across six signal families. Each entry specifies: detector ID, label, signal family, detection logic, activation threshold, and severity. Fragile patterns are marked; they receive isolation dampening when no corroborating signals are present. Direct leak patterns are marked; their presence activates a score bonus regardless of co-occurrence.
4.1 Lexical Family
Checks for the presence of 38 vocabulary items disproportionately common in AI output:
A subset of 12 terms is designated soft-banned (reduced signal weight, requiring higher density thresholds): landscape, vibrant, foster, navigate, crucial, leverage, utilize, robust, seamless, comprehensive, myriad, plethora.
Checks for 27 stock phrase templates, including: “in today’s rapidly evolving landscape,” “it’s important to note,” “it’s worth noting,” “plays a key/crucial/vital/pivotal role,” “let’s dive/delve/explore,” “at the end of the day,” “it goes without saying,” “whether you’re [X] or [Y],” “there are [N] key reasons/ways/steps/benefits.”
Detects epistemic hedge constructions: “it could be argued,” “one might say/argue/suggest,” “it seems that,” “arguably,” “it’s possible that,” “to some extent,” “this suggests that,” “this could indicate.”
Detects overuse of adverbial intensifiers that add emphasis without meaning:
Counts formal transition words:
4.2 Meta Family
Scans the first paragraph (or first 300 characters) for openings that announce the text’s own intent rather than making a concrete point: “In this article/guide/post,” “We will explore/discuss/examine,” “Let’s dive in,” “This guide will cover.”
Scans the final third of the text for stock summary markers: “In conclusion,” “In summary,” “To sum up,” “To wrap up,” “The key takeaway is.”
Detects the school-essay template: an explicit Introduction: heading paired with a thesis-announcement sentence (“This essay will explore…,” “The purpose of this paper is…,” “This paper argues…”).
Counts explicit sequence markers: First/Firstly, Second/Secondly, Third/Thirdly, Finally, Additionally, Furthermore, Moreover.
Detects mid-text restatement markers: “Overall,” “Taken together,” “In essence,” “At its core,” “Put simply,” “Simply put,” “The key takeaway here is,” “What this shows is.”
Detects helper-style response framing. Opening cues: “Certainly!” “Sure!” “Absolutely!” “Here’s a breakdown,” “Below is.” Internal cues: “let me break this down,” “based on a few criteria,” “the following criteria,” “to help you decide.”
4.3 Structure Family
Computes word counts per sentence, then counts uniform windows—consecutive runs of 4 sentences where the longest is within 3 words of the shortest. Also computes burstiness (sentence-length standard deviation ÷ mean). Text with >35% dialogue-like sentences (opening with a quotation mark) is partially exempted.
Counts sentences per paragraph; computes mean and standard deviation across all paragraphs. Requires ≥4 paragraphs to activate.
Detects neatly chunked exposition: multiple medium-length paragraphs, no dialogue, no academic citation tail (APA-style inline citations or a References section).
Detects explicit structural labels left in the prose: Introduction:, Conclusion:, Body Paragraph 1:, Abstract:, Section 2:.
Extracts the first two words of each sentence and counts repetitions. A set of common structural openers (“it is,” “in the,” “we are,” “there are,” etc.) is exempted and requires ≥7 repetitions rather than ≥5 before triggering.
Counts passive constructions: auxiliary verbs (is/are/was/were/been/being/gets/got) followed by a past participle.
Detects the enumeration pattern X: A, B, and C.
Counts numbered list markers (1.) and unordered list markers (* or -).
Detects list items that begin with a bolded mini-heading (**Term**) or a title-case label (Term:).
Detects three-item comma-separated enumerations of the form A, B, and C.
Detects sentences that open with subordinate clause starters: while, although, despite, even though, given that, considering that, whereas, notwithstanding.
4.4 Rhetoric Family
Detects the rhetorical inversion template: “it’s not just [X] but [Y],” “isn’t just about,” “this isn’t just about.”
Detects unsourced expert and study attributions: “experts say,” “many researchers believe,” “studies show,” “it is widely accepted,” “observers note,” “critics argue.”
Detects reflexive balance framing: “on one hand…on the other hand,” “while some argue…others believe,” “proponents…while critics/opponents/skeptics.”
4.5 Persona Family
Detects chatbot voice artifacts: “Great question!” “I’d be happy to help,” “I’m happy to help,” “Let me break this down,” “Hope this helps,” “Feel free to ask,” “Don’t hesitate to reach out,” “Here’s a breakdown.”
Detects AI identity language: “as an AI,” “as a language model,” “as of my knowledge cutoff,” “I cannot access real-time information,” “my training data,” “I was trained.”
4.6 Style Family
Detects substitution of basic copulas with over-elevated alternatives where “is” or “are” would be natural:
5. Scoring Algorithm
5.1 Severity Weights
| Severity | Base weight |
|---|---|
| high | 22 |
| medium | 12 |
| low | 6 |
5.2 Per-Flag Contribution
The count cap of 4 prevents any single high-frequency pattern from dominating the score.
5.3 Isolation Adjustments
Applied only when no direct leak pattern is present in the flag set:
| Condition | Multiplier |
|---|---|
| Single flag, fragile pattern | 0.35 |
| Single flag, non-fragile pattern | 0.55 |
| Single signal family, fragile pattern | 0.70 |
5.4 Co-Occurrence Bonus
The family-diversity and flag-count terms reward corroboration of evidence across independent signal types.
5.5 Final Score
5.6 Interpretation Bands
| Score range | Interpretation |
|---|---|
| 0–7 | No significant AI-typical pattern density |
| 8–29 | Low — some patterns present |
| 30–100 | High — substantial AI-typical pattern density |
Score bands should be interpreted in the context of sample length, genre, and known false-positive risk factors. See Section 7.
6. Output Format
A diagnostic result consists of:
Flags are returned sorted by severity (high → medium → low). The detail field contains human-readable evidence grounded in the source text. The detectorNote field explains the diagnostic relevance of the pattern. The version field reflects the reference implementation version, not the specification version.
7. Known Failure Modes
7.1 False Positives — Human Text Incorrectly Scored High
- Academic and institutional prose — constrained vocabulary, formal structure, and low rhythm variation overlap with detector signals. Risk is highest for UNIFORM_RHYTHM, BANNED_WORDS, TRANSITION_OVERUSE, and PASSIVE_OVERUSE.
- Second-language writing — non-native writers tend toward safe, common phrasings that share surface features with AI output.
- Heavily edited text — multiple rounds of editing flatten stylistic variation and can raise rhythm and vocabulary scores above the baseline of the original draft.
- Canonical and historical texts — older formal registers occasionally match detector patterns by coincidence. Documented instances: False Positive Hall of Fame.
- Short samples (<100 words) — a single triggered pattern can dominate the score disproportionately. FRAGILE_ISOLATED_PATTERNS dampening partially mitigates this, but short-sample scores should be treated as preliminary.
7.2 False Negatives — AI Text Incorrectly Scored Low
- Selectively edited AI drafts — targeted editing of the specific patterns tracked here produces lower scores without changing underlying authorship.
- Style-transfer prompting — models prompted to write in a specific human voice suppress many surface patterns this method detects.
- Hybrid authorship — human-outlined, AI-drafted text (or vice versa) may not trigger enough patterns to reach a meaningful score threshold.
7.3 Genre-Specific Unreliability
| Genre | FP risk | Most affected detectors |
|---|---|---|
| Legal prose | High | UNIFORM_RHYTHM, PASSIVE_OVERUSE, BANNED_WORDS |
| Product copy | High | BANNED_PHRASES, OVER_SIGNPOST |
| Academic abstracts | High | UNIFORM_RHYTHM, TRANSITION_OVERUSE, PASSIVE_OVERUSE |
| Personal essays | Low | All detectors most reliable here |
8. What the Score Does Not Establish
- The score does not identify the author of a text.
- A high score does not prove AI generation.
- A low score does not prove human authorship.
- The diagnostic should not be used as the sole basis for disciplinary action, public accusation, or irreversible decisions affecting individuals.
For safe review policy guidance, see Limitations and False Positives.
9. Versioning
The specification version is independent of the reference implementation version. The current implementation (v1.3.4 at time of this specification’s publication) implements this specification.
When detection logic, signal thresholds, or scoring weights change materially, the specification version increments. Previous version documents remain accessible at their original URLs.
| Spec version | Date | Implementation | Notes |
|---|---|---|---|
| 1.0 | 2026-04-13 | 1.3.4 | Initial publication. 22 detectors, 6 signal families. |
This specification is published under CC BY 4.0. Reference implementation copyright WROITER / 3AM Energy.