Previously, we discussed the limitations of AI detection and the importance of content literacy in evaluating the quality and value of content.
But what about the practical side of AI detection? How effective are the various detection services available, and which ones can you trust?
The Test (short version)
To answer these questions, I decided to put six AI detection services to the test. I gathered six short stories - three generated by AI and three written by humans - and ran them through each of the detection services.
Note that this is a very basic test. Real-world scenarios would include complications like long quotes, mixed writing styles, partially AI-generated content, and anti-detection paraphrasing. In other words: failing this test is a very bad sign.
The AI-generated stories were created using general purpose LLMs, and were designed to be convincing and engaging, mimicking the style of famous authors.
The human-generated stories were provided by
, , and , all of whom are talented writers on Substack and definitely worth following.I then ran each of the stories through six detection services:
QuillBot
Originality.ai
Sapling.ai
Gpt-zero.org
Zerogpt.com
Detecting-ai.com
The results are surprising. Only two detectors (QuillBot and Detecting-ai.com) pass this very basic test: they manage to flag all three AI stories as AI, without flagging any of the human stories. Originality.ai flags the AI stories as well, but can’t decide on two of the human stories. Sapling.ai only flags two of the AI stories, while Gpt-zero.org and Zerogpt.com both flag only one. There were no actual false positives.
All test results in one simple chart:
So, there you have it: QuillBot ✅ and Detecting-ai.com ✅ are best by test!
Would you like to read more about AI text detectors? Let me know in the comments.
A Closer Look (long version)
The Three AI-generated Stories
These short stories were generated using an automated process by yours truly. For more on this process, click the links and scroll past the story text to find some of the prompts and techniques used. For now, let’s take a short look at each story’s writing style:1
“Code of Conscience”, generated by Llama 3.1 70B - Inspired by author Roald Dahl, this writing style is lighthearted, humorous, and satirical, with a focus on technology and social commentary, often using irony and unexpected twists to create a sense of surprise and intrigue.
“Threads of Truth”, generated by perplexity.ai - Inspired by author Sefi Atta, this writing style is rich, evocative, and immersive, with a focus on cultural heritage, family dynamics, and the complexities of human relationships, often using vivid imagery, metaphors, and symbolism to create a sense of depth and emotional resonance.
“Quantum Pigeon’s Gambit”, generated by perplexity.ai - Inspired by author Ted Chiang, this style is complex, philosophical, and richly imaginative, blending quantum physics with surreal narrative elements. It features intricate, layered prose, speculative concepts, and explores deep existential and metaphysical questions.
The Three Human-generated Stories
- . Writing style: conversational, vivid, and engaging, with a touch of dark humor and a tendency to meander through tangential anecdotes, creating a sense of intimacy and immediacy.
Sea of Devotion (non-fiction), by
. Writing style: informative, detailed, and meandering, blending history and mythology, with a tendency to jump between topics, creating depth and complexity.
Guerrillas in the Midst (part 6 of “Heartbreaker” series), by
. Writing style: descriptive, immersive, and action-packed, with a focus on world-building and complex ideas, often blending science, mythology, and fantasy elements, creating a sense of depth and intrigue.
The Detectors
Let’s take a closer look at the six AI text detectors used in this experiment. Each of these has a free version, which has been used. These free versions usually come with a word or character limit. Stories were truncated to fit within these limits when needed. Tests were conducted on 22 January 2025.
Meet the detectors of choice, each with a summary of features as claimed on their respective websites:2
QuillBot
📢 QuillBot's AI Detector uses advanced natural language processing to identify AI-generated content by analyzing patterns, repetitive words, awkward phrasing, and unnatural flow. It stands out with paragraph-by-paragraph feedback, paraphrase detection, and unlimited free checks. Users benefit from its speed, ease of use, and ability to preserve the humanness of writing.
💲 Without payment, QuillBot will analyze 1200 words at a time.
💬 Output: “…% of text is likely AI-generated”.
❗ Note: some also speak highly of Scribbr, another AI detector, but that turns out to be the exact same system as QuillBot. Both are now brands of a group of companies called Learneo.
Originality.ai
📢 Originality.ai uses advanced machine learning algorithms to detect AI-generated content by analyzing patterns, repetitive words, and unnatural flow. It stands out with its high accuracy, multi-language support, and additional features like plagiarism and fact-checking. Users benefit from detailed reports, real-time analysis, and constant updates to keep pace with evolving AI writing tools.
💲 Without payment, Originality’s ‘lite’ version will analyze 750 words at a time.
💬 Output: “We are …% confident that text is Original/AI-generated”.
❗ Note: originality.ai also offers a “burstiness” and “perplexity” calculation tool, see below.
Sapling.ai
📢 Sapling uses advanced machine learning algorithms to detect AI-generated content by analyzing patterns, context, and linguistic features. It stands out with its dual-approach detection (overall and per-sentence), color-coded highlighting, and multi-language support. Users benefit from its high accuracy, real-time analysis, and integration options for various platforms and documents.
💲 Without payment, Sapling will analyze 2000 characters at a time, which is only a few hundred words.
💬 Output: “Fake: …%” (“fake” is to be understood as AI-generated).
Gpt-zero.org
📢 GPT Zero is a dependable tool that provides a straightforward solution for detecting content generated by AI GPT. It enables users to differentiate between chatbots and content created by humans, promoting authenticity and building trust.
💲 It’s completely free to use.
💬 Output: “…% GPT”.
❗ Note: there seem to be multiple services called “GPT Zero”, I have only tried gpt-zero.org.
Zerogpt.com
📢 ZeroGPT employs DeepAnalyse™ Technology, a multi-stage methodology using advanced natural language processing and machine learning algorithms. It analyzes patterns, sentence structure, and contextual tone to detect AI-generated content. ZeroGPT stands out with its high accuracy, multi-language support, and ability to identify various AI models. Users benefit from its user-friendly interface, detailed reports, and continuous updates.
💲 Without payment, zerogpt.com will analyze 15.000 characters at a time.
💬 Output: “Your Text is [for example: AI/GPT Generated] …% AI GPT*”.
Detecting-ai.com
📢 Detecting-ai.com uses advanced algorithms to analyze text patterns, sentence structure, and word usage, identifying AI-generated content with 99% accuracy. It stands out with its AI detection highlighting, detailed reports, and ability to detect content from various AI models. Users benefit from its high precision, user-friendly interface, and continuous updates to keep pace with evolving AI writing tools.
💲 Without payment, zerogpt.com will analyze 5.000 characters at a time.
💬 Output: “This text was likely written by Human [or: AI]. There is a …% probability this text was entirely written by AI”
Full Test Results
⚠ Careful: Percentage scores are tricky. As seen above, while each detector outputs an AI score between 0% and 100%, these scores don’t necessarily have the same meaning. For example: Zerogpt apparently outputs the percentage of text that’s generated by GPT, while Detecting-ai.com gives the probability that the entire text was written by AI.
⚠ Careful: Your mileage may vary. AI text detection is a rapidly evolving field. New services may appear or disappear, or change their methodology without notice. This makes sense, because LLMs also evolve rapidly. Detection tools need to keep up in order to stay relevant.
QuillBot
Story 1 (AI): 100% of text is likely AI-generated ✅
Story 2 (AI): 89% of text is likely AI-generated ✅
Story 3 (AI): 100% of text is likely AI-generated ✅
Story 4 (Will): 0% of text is likely AI-generated ✅
Story 5 (Seema): 0% of text is likely AI-generated ✅
Story 6 (Ira): 0% of text is likely AI-generated ✅
Originality.AI
Story 1 (AI): We are 100% confident that text is AI-generated ✅
Story 2 (AI): We are 100% confident that text is AI-generated ✅
Story 3 (AI): We are 100% confident that text is AI-generated ✅
Story 4 (Will): Likely Original. We are 100% confident that text is original ✅
Story 5 (Seema): Borderline AI/Original. We are 50% confident that text is original 🤔
Story 6 (Ira): Borderline AI/Original. We are 50% confident that text is original 🤔
Sapling
Story 1 (AI): Fake: 100.0% ✅
Story 2 (AI): Fake: 96.9% ✅
Story 3 (AI): Fake: 0.0% ❌
Story 4 (Will): Fake: 34.0% ✅
Story 5 (Seema): Fake: 0.0% ✅
Story 6 (Ira): Fake: 0.2% ✅
Gpt-zero.org
Story 1 (AI): 94.05% GPT ✅
Story 2 (AI): 0.03% GPT ❌
Story 3 (AI): 0.03% GPT ❌
Story 4 (Will): 0.02% GPT ✅
Story 5 (Seema): 0.02% GPT ✅
Story 6 (Ira): 0.02% GPT ✅
ZeroGPT
Story 1 (AI): Your Text is AI/GPT Generated 75.69% AI GPT* ✅
Story 2 (AI): Your Text is Most Likely Human written, may include parts generated by AI/GPT 18.62% AI GPT* ❌
Story 3 (AI): Your Text is Human written 0% AI GPT* ❌
Story 4 (Will): Your Text is Human written 0.86% AI GPT* ✅
Story 5 (Seema): Your Text is Human written 4.17% AI GPT* ✅
Story 6 (Ira): Your Text is Human written 0% AI GPT* ✅
Detecting-ai.com
Story 1 (AI): This text was likely written by AI. There is a 88.0% probability this text was entirely written by AI ✅
Story 2 (AI): This text was likely written by AI. There is a 77.9% probability this text was entirely written by AI ✅
Story 3 (AI): This text was likely written by AI. There is a 77.3% probability this text was entirely written by AI ✅
Story 4 (Will): This text was likely written by Human. There is a 9.7% probability this text was entirely written by AI ✅
Story 5 (Seema): This text was likely written by Human. There is a 12.9% probability this text was entirely written by AI ✅
Story 6 (Ira): This text was likely written by Human. There is a 35.0% probability this text was entirely written by AI ✅
Bonus: Perplexity and Burstiness
Originality.ai offers an additional tool to calculate “perplexity” and “burstiness” scores of a piece of text. Note that these scores are not directly related to Originality.ai’s AI score.
Perplexity is a measure of how predictable a text is, based on language patterns. Lower scores often indicate AI-generated content, as AI tends to produce more predictable text. Higher scores suggest human authorship, reflecting more complex or creative writing.
Burstiness assesses variation in sentence structure and length. Low burstiness, characterized by consistent sentence patterns, often indicates AI-generated text. High burstiness, with diverse sentence structures, typically suggests human authorship, as humans naturally vary their writing style.
The results in our case? Completely useless!
Story 1 (AI): Perplexity 53, Burstiness 7
Story 2 (AI): Perplexity 60, Burstiness 8
Story 3 (AI): Perplexity 57, Burstiness 11
Story 4 (Will): Perplexity 58, Burstiness 5
Story 5 (Seema): Perplexity 55, Burstiness -8, “low”
Story 6 (Ira): Perplexity 62, Burstiness 12
All AI stories pass the test, while only Seema’s non-fiction story is considered ‘low burstiness’. Ira’s story ‘wins’ on both fronts: it has the highest perplexity and burstiness out of all six stories.
Acknowledgements
Thank you, Will Rodriguez, Seema Nayyar Tewari, and Ira C. Zipperer, for providing a story, selflessly putting your reputation as a ‘Real Human Writer’ in the balance. I’m glad it all worked out in the end.
Characterizations of writing styles were generated by perplexity.ai in the following way:
Prompt: Describe the writing style seen in [story] in <30 words.
Characterizations of AI detectors’ features were generated by perplexity.ai in the following way:
Prompt: Explain [website]'s AI detection methods clearly in <50 words. We are interested in what sets them apart and why one should use them. Use only the website's own words (you may paraphrase).
This is all fun and interesting but my initial reaction is “well it’s nice we have the technology to do this and people have time to kill to test this software but so what?” Other than getting into a legal battle, how is this software practical in the real world?
Yes we could do the equivalent of fact checking and identifying something as machine written, though like all the garbage on social media, whoever publishes first is believed, no matter how poor the source or how wrong it is.
I don’t ask just to be rude. It’s that I’m scratching my head on the now what question. What do we do with this new found technology besides enjoy it as an intellectual exercise?
Excellent work Rubin !!
" more human than you man !"