If you're a content manager trying to understand what exactly makes AI writing detectable — not just "it sounds off" but the specific, nameable patterns — this is the reference. Each card below describes one pattern: what it looks like in the wild, why AI produces it, and how it can be caught. You don't need a technical background. If you want the hard numbers behind these patterns, see Quantitative Signals. For which patterns belong to which AI model, see AI Writing Fingerprints.
Expand all Collapse all
Lexical & Vocabulary Patterns
Clusters of elevated buzzwords tied to specific model eras: 2023 GPT-4 text with "delve," "tapestry," "pivotal," "landscape"; mid-2024 GPT-4o era with "align with," "fostering," "showcasing."
Models overfit to high-frequency, high-status collocations in training data and re-use them as safe, high-probability descriptors, especially when prompted for "professional" tone.
Wordlists segmented by era; compute density per 1k tokens and co-occurrence of multiple items from the same cluster; flag above-threshold densities.
Overuse of evaluative adjectives in neutral exposition: "rich cultural heritage," "vibrant hub," "pivotal role," "iconic landmark," even for mundane topics.
Training data overrepresents marketing and PR prose; when asked for descriptive narrative, the model collapses toward mean promotional tone.
Lexicon of promo adjectives; measure ratio of subjective/promotional terms to content nouns; flag where density exceeds genre norms.
Paragraphs densely seeded with discourse markers: "Moreover," "Furthermore," "Additionally," "However," at rates far above human usage for the genre.
Models learn that explicit connectors increase coherence scores, so they treat transition adverbs as low-risk glue between almost every sentence.
Transition wordlist; compute markers per sentence and per paragraph; flag when >X% of sentences begin with a marker.
The main entity repeatedly referred to with rotating nominal variants: "the institution," "the organization," "the entity," "the renowned center" instead of repeating the name or pronoun.
Instruction-tuned models were rewarded for avoiding repetition, leading them to cycle through near-synonyms to keep local n-gram novelty high.
Noun phrases in apposition whose semantic heads are near-synonyms for the subject, used in close proximity and never again later.
Abundant hedges ("may," "might," "appears to"), boosters ("clearly," "undeniably"), and hype words ("groundbreaking," "revolutionary") in close proximity, especially in scientific-style text.
Scientific abstracts in training corpora have recognizable rhetorical stance markers; models reproduce the mix but overuse them because stance words are high-probability anchors.
Three wordlists (hedges, boosters, hype); compute normalized counts and co-occurrence per 200 tokens; AI exhibits distinctive ratios vs. human baselines.
Text that almost never uses strongly negative or positive sentiment words, hugging mild cautious positivity. Complaint-type content reads like a balanced FAQ.
RLHF and safety tuning penalize extreme affect; models revert to low-variance sentiment as the safest global policy.
Sentiment lexicon (VADER, NRC) to score polarity distribution; look for compressed variance and scarcity of extreme valence terms in contexts where humans are usually emotional.
Long passages with almost no concrete named entities beyond the prompt: few real people, places, or products; generic placeholders like "a major retailer."
Models are conservative about hallucinating specific entities; safety filters encourage generic placeholders over risky specifics.
NER to count named entities per 1k tokens; flag low-entity-density texts in domains where humans normally name many concrete things.
Repeated bland fictive examples: "John and Sarah," "Company X vs. Company Y," "a small e-commerce store."
Templates widely used in training data; safe defaults that avoid PII.
Regex/wordlists for common dummy names and example frames; co-occurrence of several such patterns is highly indicative.
Syntactic & Sentence-Level Patterns
Systematic replacement of "is/are/was" with "serves as," "stands as," "acts as," "features," "offers" in otherwise plain sentences.
Copyediting prompts and style guides reward "stronger verbs"; models over-correct by suppressing basic copulas.
Ratio of copular constructions to predicate verbs; over-representation of a small verb set (serves, stands, acts, offers, features) vs. human baselines.
Sentences of similar length and complexity, giving a "drum machine" rhythm. Few very short or very long sentences.
Next-token sampling with moderate temperature plus training on edited prose encourages medium-length, well-formed sentences.
Sentence length variance and coefficient of variation; GPTZero-style burstiness combining sentence-level perplexity variance. See Quantitative Signals, Section 02.
Heavy reliance on a small set of clause patterns: "While X, Y," "Although X, Y," "As a result, X" with similar syntactic shapes across paragraphs.
High-probability generic structures for expressing nuance and contrast; reused as safe rhetorical macros.
N-gram patterns over POS tags; count repetitions of identical clause frames.
"Not only X, but also Y," "it's not just X — it's Y," "not a X but a Y," often stacked across sentences.
Classic essayistic moves that reliably score as "insightful" in training data; models overuse them reflexively.
Regex for "not only," "not just," "not a .* but" etc.; count per 1k tokens.
Repeated triads: "history, culture, and economy"; 3-item adjective strings; 3 parallel clauses, especially multiple triads per paragraph.
The rule of three is overrepresented in essays and speeches; models internalize it as a default emphasis structure.
Regex for comma-comma-"and" lists and three parallel verb phrases; count frequency relative to longer/shorter lists.
Reddit posts or student work with near-zero typos, perfect clauses, textbook commas, and fully formed paragraphs where peers use fragments and emojis.
Training data dominated by edited prose; generation objectives reward fluency and correctness, not idiosyncratic errors.
Grammar-checker error rate near zero + platform heuristics (no slang in subreddits where it's normative).
Personal essays written almost entirely in impersonal third person or generic "you," with minimal "I" and "we."
Models default to neutral, impersonal exposition; safety alignment dampens self-disclosure.
Ratio of first-person pronouns to total pronouns; compare against genre baselines.
Long text with almost no "?" or "!", even when the subject is emotional or rhetorical questions would be natural.
Safety constraints discourage overdramatic punctuation; generic helpful style is measured and declarative.
Punctuation counts; flag unusually low rates of questions/exclamations given genre.
Structural & Document-Level Patterns
Stock section sequence: Background → Features → Challenges → Future Prospects/Conclusion, with formulaic opening lines in each.
Memorized high-level discourse structures from training data, activated by "write an article" type prompts.
Heading text patterns + lexical cues ("Despite its X, faces challenges"; "Looking ahead, the future of X").
"The 'List of X' is a curated compilation of..." instead of simply defining X.
Misinterprets the page title string as the subject entity, copying definition patterns from listicle intros.
Regex for "The [quoted title] is a [curated/comprehensive/detailed] [compilation/collection/list]".
Bullets like "1. Historical context – explanation" using bolded pseudo-headings inside paragraphs, often with emojis.
Mimics Markdown listicle patterns from training data, even when not requested.
Regex for bold-label-dash patterns; unnatural frequency of intra-paragraph bolded labels.
Paragraphs of nearly identical sentence counts throughout, without natural variation (no one-line or very dense paragraphs).
Optimizes for readability templates; reproduces "SEO-optimized" regular paragraph lengths from training data.
Paragraph sentence-count variance; skew/kurtosis checks. See Quantitative Signals, Section 03 (SD 0.54 AI vs 1.06 human).
Short sections ending with "Overall, this highlights...", "In summary," "Taken together" that merely rephrase the previous sentences.
Instruction datasets reward adding summaries; models generalize this to all scales, even when redundant.
Phrase list; detect "overall/in summary" followed by restatement of existing terms without new information.
"This section will explore...," "The next part discusses..." in contexts where such meta-commentary is unusual (short posts, answers).
Learned from textbooks where structure previews are explicit; side effect of "step-by-step" prompting.
Regex for "In this section," "the following section"; threshold by frequency relative to document length.
Paragraphs following: (1) generic background, (2) "is important/crucial/has significant impact," (3) vague reference to broader trends.
Reuses a generic expository macro: frame, inflate significance, connect to larger system.
Patterns combining vague nouns ("legacy," "impact," "role") with adverbs like "lasting," "enduring," "ongoing."
Dedicated "Media coverage" or "Recognition" sections that are little more than bullet lists of outlet names.
Literalizes notability guidelines: proving importance by enumerating coverage.
Heading patterns ("Media coverage," "Recognition") + bullet lists whose items are mostly publication names.
Rhetorical & Pragmatic Patterns
"Experts say," "observers note," "researchers are increasingly concerned," "many believe" with no concrete source.
Mimics journalistic hedging but cannot reliably ground claims, so defaults to vague collectives.
Phrase list for weasel terms + count of named entities in the same span; flag vague plurals without supporting named sources.
Topics with clear asymmetry treated as evenly split: "On one hand... on the other hand..." with every critique offset by a compliment.
Safety and helpfulness training strongly encourage balanced responses; the model reflexively adds counterpoints.
Paired contrastive markers used repeatedly in short text; combined with sentiment symmetry analysis.
Modest topics getting "has had a lasting impact," "continues to inspire," "plays an important role in discussions of identity, culture, and community."
Reuses high-level impact templates from cultural criticism; safe fillers when specific impacts are unknown.
Patterns combining vague nouns ("legacy," "impact," "conversations") with inflating adjectives.
"Great question!", "I'm glad you asked," effusive politeness and service phrases in Reddit threads or student essays.
Base assistant trained as a chatbot persona; unless suppressed, it leaks into all outputs.
Phrase list ("I'm happy to," "let's dive in," "let's break it down") + second-person imperatives in expository text.
"Ultimately, the best choice depends on your goals, budget, and comfort level" appearing verbatim across topics.
Template responses from instruction-following datasets; resolves conflicting objectives and is rewarded in RLHF.
Phrase list of stock disclaimers and "it depends" templates; frequency across documents is especially telling.
"As of my last knowledge update in 2021" or "I cannot access real-time data" appearing in articles or essays.
System prompts train models to self-disclose limitations; users paste responses verbatim into other contexts.
Regex for "as of my last knowledge update," "as an AI language model," "I do not have access."
Statistical & Stylometric Patterns
Text whose per-token perplexity under a reference LM is significantly lower than human baselines.
Generative training maximizes log-likelihood, pushing outputs toward high-probability regions.
Compute perplexity and compare to empirically chosen thresholds per genre. See Quantitative Signals, Section 06.
Both sentence lengths and per-sentence perplexities cluster tightly around a mean, lacking human-like spikes.
Homogeneous decoding policy; no genuine cognitive fatigue, mood shifts, or topic digressions.
Variance and higher moments for sentence lengths and sentence-level perplexity.
Characteristic frequency profiles of function words ("the," "of," "in") and POS bigrams that differ from human corpora.
LLMs slightly smooth human syntactic habits; small consistent deviations show up in high-frequency tokens.
Normalized function word frequencies and POS n-grams; cosine distance to human baselines.
TTR and Hapax Legomenon Rate systematically lower than human text at matched chunk sizes.
Samples synonyms to avoid repetition but doesn't reuse them later; humans re-use key terms systematically.
TTR, MTLD, HD-D, plus hapax counts. See Quantitative Signals, Section 04 (20-36% gap).
High sentence-sentence cosine similarity throughout, with fewer "off-topic" or exploratory sentences than human writing.
Optimizes for coherence and avoids tangents; internal representation stays close to central topic embeddings.
Sentence embeddings; compute average and variance of adjacent-sentence similarity.
Body sentences unusually semantically similar to the title, reusing key phrases almost verbatim.
Conditions strongly on the input prompt and mirrors its language, especially for abstracts and introductions.
Embedding similarity between title and each sentence.
Citation & Markup Artifacts
Powerful when the pipeline involves copy-pasting raw AI output into other environments.
DOIs that fail checksum, citations resolving to unrelated articles, books with correct-looking but nonexistent publication details.
DOI/ISBN validation, link-checking against known resolvers.
Nonexistent wiki shortcuts (e.g., WP:NOTELOCAL), template names, or pseudo-JSON artifacts like "attributableIndex."
Static lists of valid templates/shortcuts; regex for capitalized shortcut-like tokens not in registry.
Links containing utm_source=chatgpt.com, utm_source=openai, or referrer=grok.com.
URL parsing; substring search over utm_source and referrer parameters.
Essays with impeccable paragraph spacing, no orphan lines, and alignment matching the default export from web chat tools.
Spacing and indentation patterns (double newlines between every paragraph, consistent whitespace).
Model-Specific & Temporal Patterns
Distinct vocabulary shifts across GPT-4, GPT-4o, GPT-5: early GPT-4 loved "delve" and "tapestry"; later models lean on "enhance" and "showcasing."
Changes in training data and RLHF objectives alter preferred lexical items over time.
Separate wordlists per era. See AI Writing Fingerprints for the full per-model breakdown.
Most frontier LLMs cluster together and away from human text in stylometric space, with some models forming distinct sub-clusters.
Shared architectures and training regimes induce similar function-word and phrase-pattern fingerprints.
Function-word and phrase-pattern differences relative to human baselines; model-specific identification via stylometry.
Evasion & Second-Generation Patterns
AI text that, after "humanizers," shows wild noisy sentence-length variation but still retains AI-like vocabulary and rhetorical macros.
Evasion tools randomize sentence length without altering deeper lexical or semantic patterns.
Combine lexical/rhetorical detectors with checks that burstiness is unusually high relative to vocabulary variety.
Many content words swapped for synonyms but argument structure, sentence ordering, and examples remain identical to an AI template.
"Rewrite to sound more human" prompts instruct paraphrasing at the token level, not reorganizing content.
Heavy synonym changes + same discourse markers and example structures = rewrite artifact.
Mostly formal prose with occasional oddly placed typos or slang ("lol," "bruh"), producing a mask effect over polished structure.
Users or tools crudely "dirty" text at the surface level without disturbing deeper structures.
High lexical sophistication coexisting with very small number of casual markers; real casual writing has broader stylistic drift.
When AI rewrites AI, fabricated references or inflated significance claims are preserved or embellished while surface diction changes.
Paraphrasers treat input as authoritative and focus on style, not factual grounding.
Hallucination patterns (invalid DOIs, non-existent entities) surviving across multiple revisions.
Long-Form & Discourse-Level Patterns
In 2000+ word pieces, the model redefines core concepts multiple times, reintroduces the topic in each section.
When unsure how to proceed, falls back to definitional exposition — high-probability and safe.
Repeated definitional patterns ("X is," "refers to," "can be defined as") for the same head term across sections.
Complex topics get evenly sized paragraphs for each subtopic, with little depth, few citations, and no focus — like an outline turned directly into prose.
Distributes attention evenly across headings, lacking strong internal preferences for depth.
Section-length variance (too low), citation counts (too low), and lack of concrete examples per subtopic.
Narrative-sounding content with archetypal scenarios ("I woke up, made coffee, and reflected on my goals") but no specific names, dates, or sensory details.
Has no experiences; mimics the pattern of anecdotes without grounded particulars.
NER and concreteness lexicon: stories with narrative markers but very low concreteness scores.
Long pieces that never truly digress: no side stories, jokes, or asides, even in informal genres. Eerily on the rails of the initial outline.
RLHF rewards staying on prompt and being "helpful"; off-topic digressions are penalized.
Semantic similarity between early and late sections; very high stability in creative or conversational genres is suspicious.
Fifty patterns. All catchable with wordlists, regex, and basic statistics — no neural network, no black-box classifier. That's the point. Every signal here is explainable: you can look at a flagged sentence and understand exactly which pattern triggered and why. That transparency is what separates WROITER from tools that just output a confidence score and leave you guessing.
Keep reading
The Catalogs Audit asks: who else has cataloged these patterns, what do they cover, and what has nobody built yet? The Quantitative Signals report has the hard numbers behind these patterns. And AI Writing Fingerprints maps which patterns belong to which model.