Why we did this
There’s no shortage of people documenting AI writing patterns. Wikipedia has a dedicated page. GPTZero publishes vocabulary lists. Pangram wrote a comprehensive blog post. Reddit threads compile hundreds of overused words. Academic papers measure everything from sentence length to metadiscourse markers.
So before we started building WROITER’s pattern library, we needed to answer a basic question: is somebody already doing this?
We went through eight major public resources, mapped what each one covers, and checked them against ten dimensions of AI writing patterns — from vocabulary tells all the way to model-specific quirks. The result is a landscape map that shows, very clearly, where the coverage is dense and where nobody is looking.
What exists
Eight resources, three types. Wikipedia-style field guides describe patterns qualitatively. Commercial detectors publish word lists. Academic papers measure features quantitatively. Click each card to see what it covers and where it stops.
Toggle allThe richest single qualitative catalog. Uses real article diffs as evidence. Covers rhetorical habits (significance puffery, promotional language), vocabulary eras (GPT-4 vs GPT-4o words), syntactic patterns (copula avoidance, rule of three), formatting quirks (Markdown in wikitext, curly quotes), and citation artifacts (broken DOIs, fabricated sources).
Platform-specific to Wikipedia. No rhythm metrics, no model-by-model breakdowns beyond brief lexical notes, no quantified prevalence. It’s a field guide, not a detection system.
A ranked list of words 10×–200× more frequent in AI text than in human writing, drawn from ~3.3 million documents and updated monthly. The public top 10 includes "delve," "pivotal," and "tapestry" with exact ratios.
It’s a word list. No sentence patterns, no structural analysis, no genre or model labels. The full list is account-gated — there’s no stable open archive.
A searchable lookup tool for overused AI words. The public-facing examples are just three words: "tapestry," "delve," "elevate." A larger database is implied but hidden behind interactions.
No structure, no rhythm, no rhetoric, no methodology. It’s a find-and-replace tool with opaque coverage.
The most holistic public write-up. Covers vocabulary (by part of speech), sentence patterns (monotony, em-dash overuse), paragraph structure (uniform length, formulaic intro/body/conclusion), tone (formal, positive, conflict-avoidant), creativity (aggregation without point of view), and metacognition (inability to tie content to personal experience).
No model-specific breakdowns. No citation artifacts. Narrative-form rather than structured taxonomy. The vocabulary list isn’t tagged by rhetorical function — you can’t look up "which of these are hedges?"
A literature map linking to empirical studies. Strong on lexical diversity, syntactic complexity, information density vs involvement, metadiscourse markers, and cross-model comparisons. Uses Biber’s multidimensional analysis framework — the gold standard for stylistic comparison.
It’s a collection of academic summaries and links, not a browsable catalog. There are no named patterns, no practical definitions, and no citation/markup analysis.
A Claude Code skill that turns AI writing patterns into before/after edit rules. Based directly on Wikipedia’s guide. Covers AI vocabulary, copula avoidance, rule of three, synonym cycling, promotional language, em-dash overuse, passive voice.
Mirrors Wikipedia rather than discovering new patterns. It’s an editing tool, not a catalog. You can’t browse or search it.
Reddit threads and blog posts compiling overused ChatGPT words. Twixify published 124+ words. r/ChatGPT and r/SEO have running threads. Some lists group words by function: connective words, summarizers, hedges, intensifiers.
Rarely tied to actual data. No structural analysis, no shared format, no model or era labels. Valuable as crowdsourced observation, not as a systematic resource.
Peer-reviewed studies measuring linguistic signals of AI authorship. Key findings: AI text is more formal and impersonal, with higher noun/determiner rates, lower lexical diversity, and more repetition. Multi-model comparisons show consistent high information density across all major LLMs.
Written for researchers, not practitioners. Feature-centric rather than pattern-named. You’d need to read five papers to assemble what could be one browsable page.
The coverage map
We checked each resource against ten dimensions of AI writing patterns. The green cells show where real coverage exists. The dashes show the holes.
Y = substantial coverage. (Y) = partial or implicit. — = absent.
| Pattern type | Wikipedia | GPTZero | Phrase Finder | Pangram | Reinhart | Humanizer | Community | Academic |
|---|---|---|---|---|---|---|---|---|
| 1. Vocabulary / words | Y | Y | (Y) | Y | (Y) | Y | Y | (Y) |
| 2. Phrases / expressions | (Y) | Y | (Y) | Y | (Y) | (Y) | Y | (Y) |
| 3. Sentence structure | Y | — | — | Y | Y | Y | — | Y |
| 4. Paragraph / document shape | Y | — | — | Y | (Y) | (Y) | — | (Y) |
| 5. Rhythm / cadence | (Y) | (Y) | — | Y | (Y) | (Y) | — | Y |
| 6. Rhetoric (tone, stance) | Y | — | — | Y | Y | Y | (Y) | Y |
| 7. Persona / voice | (Y) | — | — | (Y) | (Y) | (Y) | — | (Y) |
| 8. Meta-commentary | Y | — | — | (Y) | Y | (Y) | (Y) | (Y) |
| 9. Citation / attribution | Y | — | — | — | — | — | — | — |
| 10. Model-specific / era | (Y) | (Y) | — | — | Y | — | — | Y |
Two things jump out of this table immediately. The top two rows — vocabulary and phrases — are green across the board. Almost everyone covers those. But look at row 9: citation and attribution patterns are documented by exactly one resource. And row 7 — persona and voice markers — is almost entirely yellow. Nobody covers it well.
What we found
Everyone knows the words. Almost nobody tracks the shape.
"Delve" and "tapestry" are well-documented everywhere. But sentence-level patterns, paragraph uniformity, and document-level templates? Only Wikipedia, Pangram, and academic papers touch those — and they do it in very different ways for very different audiences. Commercial tools and community lists are purely lexical.
Rhythm has numbers but no names.
Academic papers quantify burstiness and sentence-length distributions in detail. But they don’t translate those measurements into named patterns you could explain to an editor. Pangram mentions monotony but doesn’t provide metrics. The two sides never meet.
Citation artifacts live on a single Wikipedia page.
Fabricated DOIs, hallucinated references, tracking parameters like utm_source=chatgpt.com — these are documented almost exclusively by Wikipedia editors. No commercial detector catalogs them. No academic paper treats them as a separate signal class. If that Wikipedia page disappeared, so would the documentation.
Model-specific differences exist in research but not for users.
Academic work compares GPT, Claude, and Gemini writing styles. GPTZero updates its vocabulary lists monthly. But none of this is surfaced to end users as something you can browse: "show me how GPT-4o writes differently from Claude 3.5." The data exists. The interface doesn’t.
Voice and persona are the biggest blind spot.
Academic work systematically measures engagement markers and metadiscourse. But public guides barely touch persona — the specific ways AI voice differs from human voice. How it hedges, how it fails to commit, how it never sounds genuinely uncertain. These are among the strongest tells, and almost nobody writes about them in a way practitioners can use.
What nobody has built
The audit surfaced five gaps — not minor holes, but structural absences in the landscape.
A single, multi-level taxonomy
No public resource spans vocabulary, syntax, paragraph structure, rhythm, rhetoric, persona, citation artifacts, and model-specific patterns in one navigable place. Wikipedia is qualitative and platform-specific. GPTZero is lexical only. Academic papers are feature-centric. You’d need to read all eight resources to assemble a complete picture — and even then, they use different frameworks and different terminology.
A cross-model, cross-era view
Nobody lets you see how a specific pattern behaves across models and time. Does "delve" decay over GPT versions? Is copula avoidance universal or GPT-specific? Which structural habits are GPT-4o-only? The fragments exist in papers and monthly vocabulary updates, but they’re scattered across sources and not normalized to anything queryable.
A catalog of human countersigns
Every resource focuses on AI tells. None of them systematically catalog what makes writing distinctively human: idiosyncratic metaphors, genuine self-doubt, unpolished narrative structures, localized slang, register shifts that happen mid-paragraph. Knowing what AI doesn’t do is just as valuable as knowing what it does — but nobody has built the paired catalog.
Pattern bundles that travel together
Current catalogs treat every pattern as an independent feature. But in practice, AI patterns travel in packs. An AI Wikipedia stub about a small town combines promotional tone, significance puffery, rule-of-three descriptions, and copula avoidance — all at once. Nobody models these co-occurring bundles with examples and counterexamples.
Something you can actually query
None of these resources let you filter by pattern type, model, year, genre, or language. None of them link patterns to specific transformations ("here’s how to fix this"). None expose an API. The Humanizer skill gets closest — it embodies patterns as edit rules — but it’s anchored to Wikipedia-derived patterns only and isn’t structured for browsing.
The short version
The landscape has plenty of word lists and plenty of academic features. What it doesn’t have is a public, model-aware, multi-level pattern library that spans vocabulary through document structure, comes with numbers, and is built for the people who actually need it — writers and editors, not researchers.
That’s the gap WROITER is built to fill.