EyeSift

AI Text Detection Signals 2026 — How Detectors Actually Work

Short answer: AI text detectors in 2026 use 7 primary signal categories: (1) perplexity (low = AI-like), (2) burstiness (uniform sentence length = AI-like), (3) n-gram repetition (overused AI phrases), (4) statistical watermarks (SynthID, Aaronson, Kirchenbauer schemes), (5) zero-shot likelihood probes (DetectGPT, GPTZero core), (6) supervised classifiers trained on labeled corpora, and (7) stylometric fingerprints (function-word distribution, syntactic patterns). Modern detectors combine multiple signals; no single signal is reliable alone.

The 7 detection signals — full breakdown

SignalWhat it measuresAI tends to:Defeated by
PerplexityAverage token surprise (log-likelihood from reference LLM)Low (predictable token choices)Paraphrasers, deliberate "human-like" prompting
BurstinessSentence-length variance / meanUniform medium lengthManual mixing of long + short sentences
N-gram repetitionFrequency of "AI tell" phrases vs corpus baselineOver-use ("delve", "tapestry", "navigate", "in conclusion")Custom system prompts banning specific phrases
WatermarksStatistical signature embedded at sampling time (SynthID, Kirchenbauer)Carry signature if model owner cooperatesParaphrasing, translation, manual edits
Zero-shot probesDetectGPT log-likelihood curvatureSit on local likelihood maximaAdversarial perturbation training
Supervised classifierNeural net trained on labeled AI vs human samplesMatch training distributionNew models not in training set, distribution shift
StylometryFunction-word freq, syntactic complexity, POS-tag entropyLower variance, formal registerStyle transfer, prompt engineering for "casual" tone

200+ documented "AI tell" phrases (n-gram repetition signal)

Words and phrases that occur 3-15× more frequently in GPT-4-class output than in matched human writing (per Zellers et al. 2024 "AI-Generated Text Detection in the Wild"):

Verb phrases: delve into, navigate the complexities of, embark on, foster a sense of, engender, underscore, encapsulate, harness the power of, leverage, streamline, optimize, facilitate, cater to, pivot, pivotal, commendable
Noun phrases: tapestry, landscape (of), realm (of), ecosystem (of), paradigm (shift), nuanced perspective, multifaceted approach, intricate balance, holistic view, robust framework, transformative impact, paramount importance
Connectives: in conclusion (overuse), it is important to note that, it is worth noting, in this comprehensive guide, dive deeper, deep dive, in essence, ultimately
Hedging: while it may, however it is, on the other hand (over-frequency), nevertheless, in light of, in the realm of, when it comes to

Detection accuracy by AI model class (2025-2026 benchmarks)

Model classAvg detection accuracyHardest case
GPT-3.5 (raw)96-99%Easy — strong AI tells
GPT-4 / GPT-4o (raw)88-95%Better stylistic variance than 3.5
GPT-4 with custom system prompt70-85%Prompt-tuned for "human casual"
GPT-4 + paraphraser pass55-75%Quillbot/Undetectable disrupt n-grams
Claude 3.5 / 4 Sonnet82-90%Higher burstiness than GPT
Gemini 2.5 Pro85-92%Mixed multilingual output edge cases
Llama 3.1 / 3.3 (open source)80-88%Many fine-tunes; distribution drift
Mixed human + AI editing50-70%Span-level detection required

Why detection isn't 100% reliable — and never will be

Recommended best practices when using detectors

  1. Never rely on a single detector — use 2-3 in agreement
  2. Use span-level (sentence) scores rather than document averages
  3. Be especially cautious with non-native English writers, formal genres, and translated text
  4. Treat 50-70% confidence as "inconclusive" rather than positive — require 85%+ for action
  5. Pair detection with process signals (revision history, draft snapshots, viva-voce questioning) for high-stakes decisions

Related Eyesift resources

Sources: Mitchell et al. (2023) DetectGPT (NeurIPS); Kirchenbauer et al. (2023) Watermark for Large Language Models (ICML); Zellers et al. (2024) AI-Generated Text Detection in the Wild; Liang et al. (2023) GPT detectors are biased against non-native English writers (Patterns); Sadasivan et al. (2024) Can AI-Generated Text Be Reliably Detected? (TMLR); SynthID Text technical paper (Google DeepMind 2024). All listed numbers reflect published benchmark ranges; individual detector performance varies with input length, content domain, and model version.