EyeSift
ResearchApr 30, 2026· 16 min read

GPT-5 Detection: Can Current AI Detectors Catch GPT-5 Text?

Reviewed by Brazora Monk·Last updated April 30, 2026

One independent benchmark found a spread of 31.7% to 97%+ between leading detectors on GPT-5 output — from the same model, the same text, tested the same week. That gap is the story of where AI detection currently stands with OpenAI's most capable model.

Key Takeaways

  • GPTZero achieved 97%+ detection of unedited GPT-5 output after its Summer 2025 update — without training on GPT-5 data, using generalization from its 7-component detection architecture
  • Originality.ai dropped to 31.7% accuracy on GPT-5 in comparative testing — demonstrating that training data recency determines GPT-5 detection capability more than overall tool reputation
  • No tool tested by independent evaluators exceeded 80% overall accuracy when edited and real-world content was included in the test corpus (Supwriter 150-sample study)
  • GPT-5 is harder to detect than GPT-4o because it produces lower perplexity variance and fewer characteristic AI prose patterns — the statistical signals detectors exploit have been diminished
  • Detection results must never be used as sole evidence of misconduct — every major vendor including GPTZero and Turnitin explicitly state this in their documentation

Start with the most important number in GPT-5 detection: 31.7%. That is the accuracy Originality.ai achieved on GPT-5 output in comparative benchmark testing published in 2025 — from a tool that claims 95%+ overall accuracy and is widely used in academic and publishing contexts. The figure from GPT-5-mini was even lower: 7.3%.

Juxtapose that with GPTZero's reported 97%+ detection rate on the same model, and you have the central problem with GPT-5 detection in 2026: tool selection determines your detection capability more than the difficulty of the content. Picking the wrong tool means you are running a screen that fails seven times out of ten while believing it is working.

This article examines why the variance exists, what detectors are doing differently to handle GPT-5, what the independent evidence shows about real-world performance, and what practitioners in education, publishing, and HR need to understand before relying on detection results for consequential decisions.

Why GPT-5 Is Harder to Detect Than Its Predecessors

Understanding the detection challenge requires understanding how detectors work and what GPT-5 changed. Most AI text detectors exploit one or more of the following signals:

Perplexity and Burstiness

Language models generate text by predicting the most probable next token given the preceding context. This optimization process tends to produce text with lower perplexity (more predictable word choices) and lower burstiness (less variation between sentences) than human writing. Humans make unexpected word choices, shift register between sentences, and write with structural irregularity that reflects non-linear thinking. Early GPT-4 output was highly susceptible to perplexity-based detection for this reason.

GPT-5 was trained with substantially more diverse data and optimization targets that specifically improve naturalness and variation. The result is output with higher perplexity and greater burstiness than earlier models — reducing the statistical distance from human writing that detectors exploit.

Characteristic AI Prose Patterns

Earlier ChatGPT models had detectable stylistic fingerprints that experienced readers — and trained detectors — learned to recognize: overuse of transitional phrases ("Furthermore," "It is worth noting," "In conclusion"), systematic hedging language, uniform sentence length distribution, and a preference for certain vocabulary clusters. GPTZero's linguistic fingerprinting approach maps these patterns explicitly.

GPT-5 produces fewer of these artifacts. OpenAI's training improvements included RLHF feedback specifically targeting naturalness — the model was guided away from the robotic transition phrases and formulaic structures that made earlier outputs distinctive. This is one reason tools relying primarily on stylistic fingerprinting see accuracy drops on GPT-5.

The Training Data Lag Problem

Every AI detector has a training cutoff. A model trained extensively on GPT-4 output has learned the statistical distribution of GPT-4 text. When GPT-5 releases with a different distribution — different perplexity profile, different syntactic patterns, different vocabulary preferences — a detector trained on GPT-4 data sees GPT-5 text as more "human-like" precisely because it falls outside its learned AI distribution.

This is almost certainly what happened to Originality.ai. The tool achieved strong results on GPT-4o but had not been retrained against GPT-5's output distribution at the time independent testing was conducted. The 31.7% figure is less a statement about Originality.ai's general capability and more a demonstration of how quickly detection accuracy can degrade when a new model releases without immediate detector retraining.

GPTZero's GPT-5 Detection: The Summer 2025 Update

GPTZero's response to GPT-5 is the most extensively documented case of a detector adapting to a new model. Their Summer 2025 update addressed GPT-5 detection explicitly — and the methodology they published is worth examining because it reveals how next-generation detection works.

The key disclosure: GPTZero's updated model was NOT trained on GPT-5 data. Their 97%+ detection rate came from generalization — a 7-component detection architecture that processes text through multiple independent classifiers, none of which requires specific GPT-5 training samples. The components include:

  • Statistical perplexity analysis calibrated to the human writing distribution rather than specific AI model output
  • Linguistic fingerprinting mapping sentence structure, transition patterns, and syntactic choices against a continuously updated profile
  • Burstiness measurement comparing within-text variation against human baseline distributions
  • Semantic coherence analysis examining argument structure and logical flow patterns
  • Vocabulary distribution analysis identifying statistical anomalies in word choice patterns
  • Context window consistency checking for the specific types of semantic drift characteristic of autoregressive generation
  • A final ensemble classifier integrating signals with confidence calibration

This architecture explains why GPTZero generalizes to GPT-5 without specific training: the components target properties of the generation process rather than fingerprints of a specific model. GPT-5 still generates autoregressively, still produces characteristic statistical distributions, still differs from human writing in ways that multi-component analysis can detect — even if individual signals have weakened.

GPTZero also reported achieving less than 1% false positive rates on human-written text across all supported languages in its Model 3.2m release — a significant constraint on overall accuracy that distinguishes its approach from tools with higher detection rates but also higher false positive rates.

Comparative GPT-5 Detection: What the Benchmarks Show

DetectorGPT-5 AccuracyGPT-4o AccuracyFalse Positive RateGPT-5 Retrained?
GPTZero97%+~90%<1% (Model 3.2m)No — generalized
Originality.ai31.7%~92%~5%Retraining in progress
Turnitin~75% (academic samples)~82%Near zeroPartial
EyeSift~78–82%~83%~8%Ongoing updates
Copyleaks~70–76%~78%~7%Partial
ZeroGPT~65–72%~74%~12%Unclear

Sources: Originality.ai independent benchmark (miniloop.ai 2025); GPTZero Summer 2025 update documentation; Supwriter 150-sample real-world study; Cybernews AI detector review 2026. All figures represent unedited GPT-5 output. Real-world accuracy with edited content is substantially lower across all tools.

What "Real-World Accuracy" Actually Means for GPT-5

The benchmark figures above all measure accuracy on unedited GPT-5 output. In practice, the vast majority of AI-generated content that circulates in academic, publishing, and professional contexts has been edited to some degree. This transforms the detection problem substantially.

Supwriter's independent test of 150 real-world samples — including edited AI content alongside authentic human writing — found that no tool exceeded 80% overall accuracy once editing was factored in. This aligns with Weber-Wulff et al.'s widely cited 2023 research finding that all 14 tools tested scored below 80% in realistic conditions — a finding that the 2025 generation of detectors has not fundamentally overturned despite improved controlled-benchmark performance.

The editing threshold matters. Light paraphrasing — changing individual words, rearranging sentences — has limited impact on well-trained detectors. Comprehensive revision — restructuring arguments, injecting original specific examples, rewriting paragraphs with different syntax — substantially degrades detection accuracy. The practical implication: students or professionals who substantially engage with and revise AI-generated text are largely undetectable, while those who submit near-raw output remain very detectable.

GPT-5 Detection for Educators: A Practical Framework

Educators face the most consequential version of this problem: detection results that inform decisions about student standing. The core professional obligation — which every major detection vendor explicitly states in their documentation — is that AI detection scores must never serve as sole evidence of academic dishonesty.

The International Association for Academic Integrity's 2025 guidance recommends treating AI detection as a conversation-starting signal, not a verdict. A flagged document should trigger a follow-up conversation with the student about their process, sources, and understanding — not an automatic referral to a disciplinary committee.

The practical workflow for educators using detection in 2026:

  1. Run two independent tools. Given the variance demonstrated by the GPTZero/Originality.ai comparison, relying on a single tool is insufficient. Convergence across two different detection methodologies substantially increases confidence.
  2. Set appropriate thresholds. A 60% AI probability score is inconclusive. Treat results below 75% as ambiguous and results above 85% as flagged for follow-up review — not automatic sanction.
  3. Conduct process interviews for flagged work. Ask students to explain their argument, discuss their sources, or write a brief paragraph on a related topic during class. A student who cannot demonstrate understanding of work they submitted is a much stronger signal than any detection score.
  4. Document your process. If you take any action based partly on detection results, your documentation should reflect the complete evidence set and the conversation that occurred — not just the detection score.

GPT-5 Detection for Publishers and HR Teams

For publishers screening submitted content and HR teams evaluating writing samples, the detection challenge is similar but the stakes are somewhat different — you are not adjudicating academic dishonesty but rather assessing whether submitted content represents genuine capability or delegated production.

Publishers screening guest submissions and freelancer content have the practical option of requiring revision on flagged pieces before acceptance — a less punitive approach than outright rejection, and one that quickly differentiates writers with genuine expertise from those who cannot substantively engage with the AI-drafted material.

HR teams using writing samples in hiring should understand that asking candidates to complete a time-limited writing exercise in a proctored context (or with a specific prompt revealed only at test time) provides more reliable signal than evaluating submitted portfolio work through detection alone. Detection of GPT-5 in portfolio samples is genuinely hard; testing candidates' ability to write under time pressure is not affected by model sophistication.

For any of these use cases, running free AI detection checks on submitted content provides a first-pass filter — treating flagged content as requiring additional scrutiny rather than automatic rejection is the defensible operational position.

The Arms Race: What Comes After GPT-5 Detection

The structural dynamic in AI detection has been consistent since GPT-2: detection improves, models improve, detection improves again. GPT-5 represents the most significant leap in detectability challenge to date, but it is unlikely to be the last. GPT-6 and competitive releases from Anthropic, Google, and Meta will continue to improve naturalness and reduce the statistical distance from human writing.

The most promising long-term detection approaches are not purely forensic — they are provenance-based. OpenAI's watermarking research (published as a preprint in 2024) demonstrated that statistical watermarks can be embedded in LLM output in ways that survive moderate editing and are imperceptible to human readers. If these approaches mature into production deployment, they would provide verifiable provenance rather than probabilistic inference.

Until then, the realistic operating position for practitioners is: use the best available tools, use multiple tools, understand that edited content is largely undetectable, and design processes that do not depend on detection alone for consequential decisions. The false positive problem is the binding constraint — any approach that increases detection rates by also increasing false positive rates is creating a new problem as it solves the original one.

Frequently Asked Questions

Can AI detectors detect GPT-5?

Some detectors are significantly more capable than others on GPT-5. GPTZero reported achieving over 97% detection of unedited GPT-5 output after its Summer 2025 update. Originality.ai dropped to 31.7% in independent comparative testing. The variation is extreme — tool selection matters enormously when GPT-5 detection is the specific use case.

Why did Originality.ai drop to 31.7% accuracy on GPT-5?

Originality.ai's model had not been retrained against GPT-5's specific output distribution at the time of testing. This illustrates a structural vulnerability in AI detection: detectors must continuously retrain as new model versions release. A tool performing excellently against GPT-4o can fail on GPT-5 if its training corpus does not include that model's outputs.

Does editing GPT-5 text make it undetectable?

Substantial editing significantly degrades detection accuracy across all tools. Light paraphrasing has limited effect on well-trained detectors. Comprehensive rewriting — adding original facts, restructuring arguments, changing sentence patterns throughout — can reduce AI probability scores below the flagging threshold. The practical threshold varies by tool, but 40–60% revision of a GPT-5 draft typically makes detection unreliable.

How does GPT-5 differ from GPT-4o in terms of detectability?

GPT-5 produces more contextually coherent, stylistically varied text than GPT-4o, with lower perplexity variation — one of the key signals detectors exploit. GPT-5 output also shows less of the characteristic AI prose patterns (uniform transition phrases, sentence complexity, hedging language) that trained detectors recognized in earlier models.

What is the most accurate AI detector for GPT-5 in 2026?

Based on available benchmarks, GPTZero shows the strongest GPT-5 detection with a reported 97%+ rate on unedited output. Turnitin performs well on academic writing but has higher false positive rates on non-native English text. EyeSift provides free unlimited detection with a multi-signal approach. Independent testing found no tool exceeded 80% when edited content was included.

Is it fair to accuse someone based on an AI detector flagging GPT-5?

No. AI detection results should never be used as sole evidence of academic dishonesty or policy violation. Even the best detectors produce false positives — flagging human-written text as AI-generated. Every major detection company including GPTZero and Turnitin explicitly recommends using detection scores as one input in a broader assessment process, not as a standalone verdict.

Test Your Content Against GPT-5 Detection — Free

EyeSift's multi-signal detector analyzes any text for AI probability. No account, no character limit, no waiting. See exactly where your content registers on the detection spectrum.

Check Text Now — Free