Existing Catalogs of AI Writing Patterns: Audit and Gap Analysis

Why we did this

There’s no shortage of people documenting AI writing patterns. Wikipedia has a dedicated page. GPTZero publishes vocabulary lists. Pangram wrote a comprehensive blog post. Reddit threads compile hundreds of overused words. Academic papers measure everything from sentence length to metadiscourse markers.

So before we started building WROITER’s pattern library, we needed to answer a basic question: is somebody already doing this?

We went through eight major public resources, mapped what each one covers, and checked them against ten dimensions of AI writing patterns — from vocabulary tells all the way to model-specific quirks. The result is a landscape map that shows, very clearly, where the coverage is dense and where nobody is looking.

What exists

Eight resources, three types. Wikipedia-style field guides describe patterns qualitatively. Commercial detectors publish word lists. Academic papers measure features quantitatively. Click each card to see what it covers and where it stops.

Toggle all

1 Wikipedia “Signs of AI Writing” Field guide

What it is

The richest single qualitative catalog. Uses real article diffs as evidence. Covers rhetorical habits (significance puffery, promotional language), vocabulary eras (GPT-4 vs GPT-4o words), syntactic patterns (copula avoidance, rule of three), formatting quirks (Markdown in wikitext, curly quotes), and citation artifacts (broken DOIs, fabricated sources).

Where it stops

Platform-specific to Wikipedia. No rhythm metrics, no model-by-model breakdowns beyond brief lexical notes, no quantified prevalence. It’s a field guide, not a detection system.

2 GPTZero AI Vocabulary Commercial detector

What it is

A ranked list of words 10×–200× more frequent in AI text than in human writing, drawn from ~3.3 million documents and updated monthly. The public top 10 includes "delve," "pivotal," and "tapestry" with exact ratios.

Where it stops

It’s a word list. No sentence patterns, no structural analysis, no genre or model labels. The full list is account-gated — there’s no stable open archive.

3 AI Phrase Finder Vocabulary tool

What it is

A searchable lookup tool for overused AI words. The public-facing examples are just three words: "tapestry," "delve," "elevate." A larger database is implied but hidden behind interactions.

Where it stops

No structure, no rhythm, no rhetoric, no methodology. It’s a find-and-replace tool with opaque coverage.

4 Pangram Labs Guide Detector blog

What it is

The most holistic public write-up. Covers vocabulary (by part of speech), sentence patterns (monotony, em-dash overuse), paragraph structure (uniform length, formulaic intro/body/conclusion), tone (formal, positive, conflict-avoidant), creativity (aggregation without point of view), and metacognition (inability to tie content to personal experience).

Where it stops

No model-specific breakdowns. No citation artifacts. Narrative-form rather than structured taxonomy. The vocabulary list isn’t tagged by rhetorical function — you can’t look up "which of these are hedges?"

5 Alex Reinhart’s “LLM Writing Styles” Academic notebook

What it is

A literature map linking to empirical studies. Strong on lexical diversity, syntactic complexity, information density vs involvement, metadiscourse markers, and cross-model comparisons. Uses Biber’s multidimensional analysis framework — the gold standard for stylistic comparison.

Where it stops

It’s a collection of academic summaries and links, not a browsable catalog. There are no named patterns, no practical definitions, and no citation/markup analysis.

6 GitHub “Humanizer” Skill Transformation rules

What it is

A Claude Code skill that turns AI writing patterns into before/after edit rules. Based directly on Wikipedia’s guide. Covers AI vocabulary, copula avoidance, rule of three, synonym cycling, promotional language, em-dash overuse, passive voice.

Where it stops

Mirrors Wikipedia rather than discovering new patterns. It’s an editing tool, not a catalog. You can’t browse or search it.

7 Community and Blog Lists Crowdsourced

What it is

Reddit threads and blog posts compiling overused ChatGPT words. Twixify published 124+ words. r/ChatGPT and r/SEO have running threads. Some lists group words by function: connective words, summarizers, hedges, intensifiers.

Where it stops

Rarely tied to actual data. No structural analysis, no shared format, no model or era labels. Valuable as crowdsourced observation, not as a systematic resource.

8 Academic Surveys and Marker Studies Research papers

What it is

Peer-reviewed studies measuring linguistic signals of AI authorship. Key findings: AI text is more formal and impersonal, with higher noun/determiner rates, lower lexical diversity, and more repetition. Multi-model comparisons show consistent high information density across all major LLMs.

Where it stops

Written for researchers, not practitioners. Feature-centric rather than pattern-named. You’d need to read five papers to assemble what could be one browsable page.

The coverage map

We checked each resource against ten dimensions of AI writing patterns. The green cells show where real coverage exists. The dashes show the holes.

Y = substantial coverage. (Y) = partial or implicit. — = absent.

Pattern type	Wikipedia	GPTZero	Phrase Finder	Pangram	Reinhart	Humanizer	Community	Academic
1. Vocabulary / words	Y	Y	(Y)	Y	(Y)	Y	Y	(Y)
2. Phrases / expressions	(Y)	Y	(Y)	Y	(Y)	(Y)	Y	(Y)
3. Sentence structure	Y	—	—	Y	Y	Y	—	Y
4. Paragraph / document shape	Y	—	—	Y	(Y)	(Y)	—	(Y)
5. Rhythm / cadence	(Y)	(Y)	—	Y	(Y)	(Y)	—	Y
6. Rhetoric (tone, stance)	Y	—	—	Y	Y	Y	(Y)	Y
7. Persona / voice	(Y)	—	—	(Y)	(Y)	(Y)	—	(Y)
8. Meta-commentary	Y	—	—	(Y)	Y	(Y)	(Y)	(Y)
9. Citation / attribution	Y	—	—	—	—	—	—	—
10. Model-specific / era	(Y)	(Y)	—	—	Y	—	—	Y

Two things jump out of this table immediately. The top two rows — vocabulary and phrases — are green across the board. Almost everyone covers those. But look at row 9: citation and attribution patterns are documented by exactly one resource. And row 7 — persona and voice markers — is almost entirely yellow. Nobody covers it well.

What we found

Everyone knows the words. Almost nobody tracks the shape.

"Delve" and "tapestry" are well-documented everywhere. But sentence-level patterns, paragraph uniformity, and document-level templates? Only Wikipedia, Pangram, and academic papers touch those — and they do it in very different ways for very different audiences. Commercial tools and community lists are purely lexical.

Rhythm has numbers but no names.

Academic papers quantify burstiness and sentence-length distributions in detail. But they don’t translate those measurements into named patterns you could explain to an editor. Pangram mentions monotony but doesn’t provide metrics. The two sides never meet.

Citation artifacts live on a single Wikipedia page.

Fabricated DOIs, hallucinated references, tracking parameters like utm_source=chatgpt.com — these are documented almost exclusively by Wikipedia editors. No commercial detector catalogs them. No academic paper treats them as a separate signal class. If that Wikipedia page disappeared, so would the documentation.

Model-specific differences exist in research but not for users.

Academic work compares GPT, Claude, and Gemini writing styles. GPTZero updates its vocabulary lists monthly. But none of this is surfaced to end users as something you can browse: "show me how GPT-4o writes differently from Claude 3.5." The data exists. The interface doesn’t.

Voice and persona are the biggest blind spot.

Academic work systematically measures engagement markers and metadiscourse. But public guides barely touch persona — the specific ways AI voice differs from human voice. How it hedges, how it fails to commit, how it never sounds genuinely uncertain. These are among the strongest tells, and almost nobody writes about them in a way practitioners can use.

What nobody has built

The audit surfaced five gaps — not minor holes, but structural absences in the landscape.

A single, multi-level taxonomy

No public resource spans vocabulary, syntax, paragraph structure, rhythm, rhetoric, persona, citation artifacts, and model-specific patterns in one navigable place. Wikipedia is qualitative and platform-specific. GPTZero is lexical only. Academic papers are feature-centric. You’d need to read all eight resources to assemble a complete picture — and even then, they use different frameworks and different terminology.

A cross-model, cross-era view

Nobody lets you see how a specific pattern behaves across models and time. Does "delve" decay over GPT versions? Is copula avoidance universal or GPT-specific? Which structural habits are GPT-4o-only? The fragments exist in papers and monthly vocabulary updates, but they’re scattered across sources and not normalized to anything queryable.

A catalog of human countersigns

Every resource focuses on AI tells. None of them systematically catalog what makes writing distinctively human: idiosyncratic metaphors, genuine self-doubt, unpolished narrative structures, localized slang, register shifts that happen mid-paragraph. Knowing what AI doesn’t do is just as valuable as knowing what it does — but nobody has built the paired catalog.

Pattern bundles that travel together

Current catalogs treat every pattern as an independent feature. But in practice, AI patterns travel in packs. An AI Wikipedia stub about a small town combines promotional tone, significance puffery, rule-of-three descriptions, and copula avoidance — all at once. Nobody models these co-occurring bundles with examples and counterexamples.

Something you can actually query

None of these resources let you filter by pattern type, model, year, genre, or language. None of them link patterns to specific transformations ("here’s how to fix this"). None expose an API. The Humanizer skill gets closest — it embodies patterns as edit rules — but it’s anchored to Wikipedia-derived patterns only and isn’t structured for browsing.

The short version

The landscape has plenty of word lists and plenty of academic features. What it doesn’t have is a public, model-aware, multi-level pattern library that spans vocabulary through document structure, comes with numbers, and is built for the people who actually need it — writers and editors, not researchers.

That’s the gap WROITER is built to fill.

Keep reading: Rule-Based Signals is the 50-pattern catalog itself. Quantitative Signals puts the numbers behind the patterns. AI Writing Fingerprints maps patterns to models.