Rule-Based Signals of AI-Generated Writing

Who this is for

If you're a content manager trying to understand what exactly makes AI writing detectable — not just "it sounds off" but the specific, nameable patterns — this is the reference. Each card below describes one pattern: what it looks like in the wild, why AI produces it, and how it can be caught. You don't need a technical background. If you want the hard numbers behind these patterns, see Quantitative Signals. For which patterns belong to which AI model, see AI Writing Fingerprints.

Expand all Collapse all

Lexical & Vocabulary Patterns

01 Era-Specific AI Vocabulary Clouds

What it looks like

Clusters of elevated buzzwords tied to specific model eras: 2023 GPT-4 text with "delve," "tapestry," "pivotal," "landscape"; mid-2024 GPT-4o era with "align with," "fostering," "showcasing."

Why AI does it

Models overfit to high-frequency, high-status collocations in training data and re-use them as safe, high-probability descriptors, especially when prompted for "professional" tone.

Detection sketch

Wordlists segmented by era; compute density per 1k tokens and co-occurrence of multiple items from the same cluster; flag above-threshold densities.

02 Promotional Superlative Drift

What it looks like

Overuse of evaluative adjectives in neutral exposition: "rich cultural heritage," "vibrant hub," "pivotal role," "iconic landmark," even for mundane topics.

Why AI does it

Training data overrepresents marketing and PR prose; when asked for descriptive narrative, the model collapses toward mean promotional tone.

Detection sketch

Lexicon of promo adjectives; measure ratio of subjective/promotional terms to content nouns; flag where density exceeds genre norms.

03 Formal Transition Overload

What it looks like

Paragraphs densely seeded with discourse markers: "Moreover," "Furthermore," "Additionally," "However," at rates far above human usage for the genre.

Why AI does it

Models learn that explicit connectors increase coherence scores, so they treat transition adverbs as low-risk glue between almost every sentence.

Detection sketch

Transition wordlist; compute markers per sentence and per paragraph; flag when >X% of sentences begin with a marker.

04 Elegant Variation on Steroids

What it looks like

The main entity repeatedly referred to with rotating nominal variants: "the institution," "the organization," "the entity," "the renowned center" instead of repeating the name or pronoun.

Why AI does it

Instruction-tuned models were rewarded for avoiding repetition, leading them to cycle through near-synonyms to keep local n-gram novelty high.

Detection sketch

Noun phrases in apposition whose semantic heads are near-synonyms for the subject, used in close proximity and never again later.

05 Hedger-Booster-Hype Lemma Mixture

What it looks like

Abundant hedges ("may," "might," "appears to"), boosters ("clearly," "undeniably"), and hype words ("groundbreaking," "revolutionary") in close proximity, especially in scientific-style text.

Why AI does it

Scientific abstracts in training corpora have recognizable rhetorical stance markers; models reproduce the mix but overuse them because stance words are high-probability anchors.

Detection sketch

Three wordlists (hedges, boosters, hype); compute normalized counts and co-occurrence per 200 tokens; AI exhibits distinctive ratios vs. human baselines.

06 Over-Neutralized Sentiment

What it looks like

Text that almost never uses strongly negative or positive sentiment words, hugging mild cautious positivity. Complaint-type content reads like a balanced FAQ.

Why AI does it

RLHF and safety tuning penalize extreme affect; models revert to low-variance sentiment as the safest global policy.

Detection sketch

Sentiment lexicon (VADER, NRC) to score polarity distribution; look for compressed variance and scarcity of extreme valence terms in contexts where humans are usually emotional.

07 Anemic Proper Noun Ecology

What it looks like

Long passages with almost no concrete named entities beyond the prompt: few real people, places, or products; generic placeholders like "a major retailer."

Why AI does it

Models are conservative about hallucinating specific entities; safety filters encourage generic placeholders over risky specifics.

Detection sketch

NER to count named entities per 1k tokens; flag low-entity-density texts in domains where humans normally name many concrete things.

08 Synthetic Name and Example Templates

What it looks like

Repeated bland fictive examples: "John and Sarah," "Company X vs. Company Y," "a small e-commerce store."

Why AI does it

Templates widely used in training data; safe defaults that avoid PII.

Detection sketch

Regex/wordlists for common dummy names and example frames; co-occurrence of several such patterns is highly indicative.

Syntactic & Sentence-Level Patterns

09 Copula Avoidance in Favour of Fancy Verbs

What it looks like

Systematic replacement of "is/are/was" with "serves as," "stands as," "acts as," "features," "offers" in otherwise plain sentences.

Why AI does it

Copyediting prompts and style guides reward "stronger verbs"; models over-correct by suppressing basic copulas.

Detection sketch

Ratio of copular constructions to predicate verbs; over-representation of a small verb set (serves, stands, acts, offers, features) vs. human baselines.

10 Metronomic Syntax with Low Burstiness

What it looks like

Sentences of similar length and complexity, giving a "drum machine" rhythm. Few very short or very long sentences.

Why AI does it

Next-token sampling with moderate temperature plus training on edited prose encourages medium-length, well-formed sentences.

Detection sketch

Sentence length variance and coefficient of variation; GPTZero-style burstiness combining sentence-level perplexity variance. See Quantitative Signals, Section 02.

11 Over-Regularized Subordinate Clauses

What it looks like

Heavy reliance on a small set of clause patterns: "While X, Y," "Although X, Y," "As a result, X" with similar syntactic shapes across paragraphs.

Why AI does it

High-probability generic structures for expressing nuance and contrast; reused as safe rhetorical macros.

Detection sketch

N-gram patterns over POS tags; count repetitions of identical clause frames.

12 Negative Parallelism Macros

What it looks like

"Not only X, but also Y," "it's not just X — it's Y," "not a X but a Y," often stacked across sentences.

Why AI does it

Classic essayistic moves that reliably score as "insightful" in training data; models overuse them reflexively.

Detection sketch

Regex for "not only," "not just," "not a .* but" etc.; count per 1k tokens.

13 Compulsive Rule-of-Three Lists

What it looks like

Repeated triads: "history, culture, and economy"; 3-item adjective strings; 3 parallel clauses, especially multiple triads per paragraph.

Why AI does it

The rule of three is overrepresented in essays and speeches; models internalize it as a default emphasis structure.

Detection sketch

Regex for comma-comma-"and" lists and three parallel verb phrases; count frequency relative to longer/shorter lists.

14 Excessively Clean Grammar in Casual Contexts

What it looks like

Reddit posts or student work with near-zero typos, perfect clauses, textbook commas, and fully formed paragraphs where peers use fragments and emojis.

Why AI does it

Training data dominated by edited prose; generation objectives reward fluency and correctness, not idiosyncratic errors.

Detection sketch

Grammar-checker error rate near zero + platform heuristics (no slang in subreddits where it's normative).

15 Sparse First-Person Pronoun Use

What it looks like

Personal essays written almost entirely in impersonal third person or generic "you," with minimal "I" and "we."

Why AI does it

Models default to neutral, impersonal exposition; safety alignment dampens self-disclosure.

Detection sketch

Ratio of first-person pronouns to total pronouns; compare against genre baselines.

16 Question and Exclamation Under-Use

What it looks like

Long text with almost no "?" or "!", even when the subject is emotional or rhetorical questions would be natural.

Why AI does it

Safety constraints discourage overdramatic punctuation; generic helpful style is measured and declarative.

Detection sketch

Punctuation counts; flag unusually low rates of questions/exclamations given genre.

Structural & Document-Level Patterns

17 Template Outlines: "Challenges" and "Future Prospects"

What it looks like

Stock section sequence: Background → Features → Challenges → Future Prospects/Conclusion, with formulaic opening lines in each.

Why AI does it

Memorized high-level discourse structures from training data, activated by "write an article" type prompts.

Detection sketch

Heading text patterns + lexical cues ("Despite its X, faces challenges"; "Looking ahead, the future of X").

18 Leads That Define Titles as Proper Nouns

What it looks like

"The 'List of X' is a curated compilation of..." instead of simply defining X.

Why AI does it

Misinterprets the page title string as the subject entity, copying definition patterns from listicle intros.

Detection sketch

Regex for "The [quoted title] is a [curated/comprehensive/detailed] [compilation/collection/list]".

19 Over-Structured Inline Lists with Bolded Labels

What it looks like

Bullets like "1. Historical context – explanation" using bolded pseudo-headings inside paragraphs, often with emojis.

Why AI does it

Mimics Markdown listicle patterns from training data, even when not requested.

Detection sketch

Regex for bold-label-dash patterns; unnatural frequency of intra-paragraph bolded labels.

20 Hyper-Neat Paragraph Sizing and Symmetry

What it looks like

Paragraphs of nearly identical sentence counts throughout, without natural variation (no one-line or very dense paragraphs).

Why AI does it

Optimizes for readability templates; reproduces "SEO-optimized" regular paragraph lengths from training data.

Detection sketch

Paragraph sentence-count variance; skew/kurtosis checks. See Quantitative Signals, Section 03 (SD 0.54 AI vs 1.06 human).

21 Compulsive Micro-Summaries

What it looks like

Short sections ending with "Overall, this highlights...", "In summary," "Taken together" that merely rephrase the previous sentences.

Why AI does it

Instruction datasets reward adding summaries; models generalize this to all scales, even when redundant.

Detection sketch

Phrase list; detect "overall/in summary" followed by restatement of existing terms without new information.

22 Over-Explicit Section Signposting

What it looks like

"This section will explore...," "The next part discusses..." in contexts where such meta-commentary is unusual (short posts, answers).

Why AI does it

Learned from textbooks where structure previews are explicit; side effect of "step-by-step" prompting.

Detection sketch

Regex for "In this section," "the following section"; threshold by frequency relative to document length.

23 Context-Importance Paragraph Template

What it looks like

Paragraphs following: (1) generic background, (2) "is important/crucial/has significant impact," (3) vague reference to broader trends.

Why AI does it

Reuses a generic expository macro: frame, inflate significance, connect to larger system.

Detection sketch

Patterns combining vague nouns ("legacy," "impact," "role") with adverbs like "lasting," "enduring," "ongoing."

24 Rigid Notability and Media Coverage Sections

What it looks like

Dedicated "Media coverage" or "Recognition" sections that are little more than bullet lists of outlet names.

Why AI does it

Literalizes notability guidelines: proving importance by enumerating coverage.

Detection sketch

Heading patterns ("Media coverage," "Recognition") + bullet lists whose items are mostly publication names.

Rhetorical & Pragmatic Patterns

25 Vague Weasel Attributions

What it looks like

"Experts say," "observers note," "researchers are increasingly concerned," "many believe" with no concrete source.

Why AI does it

Mimics journalistic hedging but cannot reliably ground claims, so defaults to vague collectives.

Detection sketch

Phrase list for weasel terms + count of named entities in the same span; flag vague plurals without supporting named sources.

26 False Balance and Both-Sides-ism

What it looks like

Topics with clear asymmetry treated as evenly split: "On one hand... on the other hand..." with every critique offset by a compliment.

Why AI does it

Safety and helpfulness training strongly encourage balanced responses; the model reflexively adds counterpoints.

Detection sketch

Paired contrastive markers used repeatedly in short text; combined with sentiment symmetry analysis.

27 Over-Generalized Social Impact Statements

What it looks like

Modest topics getting "has had a lasting impact," "continues to inspire," "plays an important role in discussions of identity, culture, and community."

Why AI does it

Reuses high-level impact templates from cultural criticism; safe fillers when specific impacts are unknown.

Detection sketch

Patterns combining vague nouns ("legacy," "impact," "conversations") with inflating adjectives.

28 Overly Friendly Assistant Persona

What it looks like

"Great question!", "I'm glad you asked," effusive politeness and service phrases in Reddit threads or student essays.

Why AI does it

Base assistant trained as a chatbot persona; unless suppressed, it leaks into all outputs.

Detection sketch

Phrase list ("I'm happy to," "let's dive in," "let's break it down") + second-person imperatives in expository text.

29 Pseudo-Individualized Advice Without Specificity

What it looks like

"Ultimately, the best choice depends on your goals, budget, and comfort level" appearing verbatim across topics.

Why AI does it

Template responses from instruction-following datasets; resolves conflicting objectives and is rewarded in RLHF.

Detection sketch

Phrase list of stock disclaimers and "it depends" templates; frequency across documents is especially telling.

30 Knowledge-Cutoff and Safety Disclaimers

What it looks like

"As of my last knowledge update in 2021" or "I cannot access real-time data" appearing in articles or essays.

Why AI does it

System prompts train models to self-disclose limitations; users paste responses verbatim into other contexts.

Detection sketch

Regex for "as of my last knowledge update," "as an AI language model," "I do not have access."

Statistical & Stylometric Patterns

31 Low Perplexity with Sharp Thresholds

What it looks like

Text whose per-token perplexity under a reference LM is significantly lower than human baselines.

Why AI does it

Generative training maximizes log-likelihood, pushing outputs toward high-probability regions.

Detection sketch

Compute perplexity and compare to empirically chosen thresholds per genre. See Quantitative Signals, Section 06.

32 Low Burstiness in Perplexity and Sentence Length

What it looks like

Both sentence lengths and per-sentence perplexities cluster tightly around a mean, lacking human-like spikes.

Why AI does it

Homogeneous decoding policy; no genuine cognitive fatigue, mood shifts, or topic digressions.

Detection sketch

Variance and higher moments for sentence lengths and sentence-level perplexity.

33 Distinct Function-Word and POS Distributions

What it looks like

Characteristic frequency profiles of function words ("the," "of," "in") and POS bigrams that differ from human corpora.

Why AI does it

LLMs slightly smooth human syntactic habits; small consistent deviations show up in high-frequency tokens.

Detection sketch

Normalized function word frequencies and POS n-grams; cosine distance to human baselines.

34 Compressed Type-Token Ratio

What it looks like

TTR and Hapax Legomenon Rate systematically lower than human text at matched chunk sizes.

Why AI does it

Samples synonyms to avoid repetition but doesn't reuse them later; humans re-use key terms systematically.

Detection sketch

TTR, MTLD, HD-D, plus hapax counts. See Quantitative Signals, Section 04 (20-36% gap).

35 Over-Consistent Semantic Similarity

What it looks like

High sentence-sentence cosine similarity throughout, with fewer "off-topic" or exploratory sentences than human writing.

Why AI does it

Optimizes for coherence and avoids tangents; internal representation stays close to central topic embeddings.

Detection sketch

Sentence embeddings; compute average and variance of adjacent-sentence similarity.

36 Title-Body Over-Alignment

What it looks like

Body sentences unusually semantically similar to the title, reusing key phrases almost verbatim.

Why AI does it

Conditions strongly on the input prompt and mirrors its language, especially for abstracts and introductions.

Detection sketch

Embedding similarity between title and each sentence.

Citation & Markup Artifacts

Powerful when the pipeline involves copy-pasting raw AI output into other environments.

37 Hallucinated Reference Metadata

What it looks like

DOIs that fail checksum, citations resolving to unrelated articles, books with correct-looking but nonexistent publication details.

Detection sketch

DOI/ISBN validation, link-checking against known resolvers.

38 Broken or Hallucinated Markup

What it looks like

Nonexistent wiki shortcuts (e.g., WP:NOTELOCAL), template names, or pseudo-JSON artifacts like "attributableIndex."

Detection sketch

Static lists of valid templates/shortcuts; regex for capitalized shortcut-like tokens not in registry.

39 AI-Specific Tracking Parameters in URLs

What it looks like

Links containing utm_source=chatgpt.com, utm_source=openai, or referrer=grok.com.

Detection sketch

URL parsing; substring search over utm_source and referrer parameters.

40 Perfect Spacing and Formatting Artifacts

What it looks like

Essays with impeccable paragraph spacing, no orphan lines, and alignment matching the default export from web chat tools.

Detection sketch

Spacing and indentation patterns (double newlines between every paragraph, consistent whitespace).

Model-Specific & Temporal Patterns

41 Model-Era Vocabulary Signatures

What it looks like

Distinct vocabulary shifts across GPT-4, GPT-4o, GPT-5: early GPT-4 loved "delve" and "tapestry"; later models lean on "enhance" and "showcasing."

Why AI does it

Changes in training data and RLHF objectives alter preferred lexical items over time.

Detection sketch

Separate wordlists per era. See AI Writing Fingerprints for the full per-model breakdown.

42 Cross-Model Stylometric Clustering

What it looks like

Most frontier LLMs cluster together and away from human text in stylometric space, with some models forming distinct sub-clusters.

Why AI does it

Shared architectures and training regimes induce similar function-word and phrase-pattern fingerprints.

Detection sketch

Function-word and phrase-pattern differences relative to human baselines; model-specific identification via stylometry.

Evasion & Second-Generation Patterns

43 Over-Humanized Burstiness

What it looks like

AI text that, after "humanizers," shows wild noisy sentence-length variation but still retains AI-like vocabulary and rhetorical macros.

Why AI does it

Evasion tools randomize sentence length without altering deeper lexical or semantic patterns.

Detection sketch

Combine lexical/rhetorical detectors with checks that burstiness is unusually high relative to vocabulary variety.

44 Synonym Swaps Without Conceptual Change

What it looks like

Many content words swapped for synonyms but argument structure, sentence ordering, and examples remain identical to an AI template.

Why AI does it

"Rewrite to sound more human" prompts instruct paraphrasing at the token level, not reorganizing content.

Detection sketch

Heavy synonym changes + same discourse markers and example structures = rewrite artifact.

45 Injected Typos and Slang Over a Formal Core

What it looks like

Mostly formal prose with occasional oddly placed typos or slang ("lol," "bruh"), producing a mask effect over polished structure.

Why AI does it

Users or tools crudely "dirty" text at the surface level without disturbing deeper structures.

Detection sketch

High lexical sophistication coexisting with very small number of casual markers; real casual writing has broader stylistic drift.

46 Second-Generation Hallucination Amplification

What it looks like

When AI rewrites AI, fabricated references or inflated significance claims are preserved or embellished while surface diction changes.

Why AI does it

Paraphrasers treat input as authoritative and focus on style, not factual grounding.

Detection sketch

Hallucination patterns (invalid DOIs, non-existent entities) surviving across multiple revisions.

Long-Form & Discourse-Level Patterns

47 Sectional Redundancy and Re-Explaining Basics

What it looks like

In 2000+ word pieces, the model redefines core concepts multiple times, reintroduces the topic in each section.

Why AI does it

When unsure how to proceed, falls back to definitional exposition — high-probability and safe.

Detection sketch

Repeated definitional patterns ("X is," "refers to," "can be defined as") for the same head term across sections.

48 Shallow Topic Coverage with Even Surface Allocation

What it looks like

Complex topics get evenly sized paragraphs for each subtopic, with little depth, few citations, and no focus — like an outline turned directly into prose.

Why AI does it

Distributes attention evenly across headings, lacking strong internal preferences for depth.

Detection sketch

Section-length variance (too low), citation counts (too low), and lack of concrete examples per subtopic.

49 Absence of Genuine Anecdotes and Lived Detail

What it looks like

Narrative-sounding content with archetypal scenarios ("I woke up, made coffee, and reflected on my goals") but no specific names, dates, or sensory details.

Why AI does it

Has no experiences; mimics the pattern of anecdotes without grounded particulars.

Detection sketch

NER and concreteness lexicon: stories with narrative markers but very low concreteness scores.

50 Topic-Drift Aversion

What it looks like

Long pieces that never truly digress: no side stories, jokes, or asides, even in informal genres. Eerily on the rails of the initial outline.

Why AI does it

RLHF rewards staying on prompt and being "helpful"; off-topic digressions are penalized.

Detection sketch

Semantic similarity between early and late sections; very high stability in creative or conversational genres is suspicious.

The short version

Fifty patterns. All catchable with wordlists, regex, and basic statistics — no neural network, no black-box classifier. That's the point. Every signal here is explainable: you can look at a flagged sentence and understand exactly which pattern triggered and why. That transparency is what separates WROITER from tools that just output a confidence score and leave you guessing.

Keep reading

The Catalogs Audit asks: who else has cataloged these patterns, what do they cover, and what has nobody built yet? The Quantitative Signals report has the hard numbers behind these patterns. And AI Writing Fingerprints maps which patterns belong to which model.