Perceptual Gaps: ASCII Art and Overlapping Audio as CAPTCHA
Choon-Hou Rafael Chong

TL;DR
This paper introduces ASCII art and overlapping audio tasks as new CAPTCHA methods, which current large language models cannot effectively solve, suggesting they are highly effective for human verification.
Contribution
The paper proposes novel vision and audio CAPTCHA tasks based on human-specialized neural processing, evaluated against state-of-the-art LLMs, showing their current robustness.
Findings
LLMs could only infer at most one or two characters in ASCII CAPTCHA.
Models performed only modestly better than random on audio CAPTCHA.
These CAPTCHA methods are highly effective against current AI models.
Abstract
As multimodal large language models (LLMs) advance, traditional CAPTCHAs have become obsolete at distinguishing humans from bots. To address this shift, this paper aims to investigate the possibility of using tasks for which humans have evolved highly specialised neural processing. We introduce two CAPTCHA classes: a vision-based CAPTCHA, which renders alphanumeric strings as ASCII art, and an audio-based CAPTCHA, which is a question-answering task with overlapping or noise-corrupted audio context. We evaluate our vision-based CAPTCHA both as text and image input with multiple frontier LLMs (GPT 5.2, Gemini 3, etc.), and assess our audio-based CAPTCHAs by applying augmentations like background noise, Gaussian noise, and overlapping speech. We determined that none of the LLMs were able to solve a single ASCII-based CAPTCHA, with the best performing model only being able to infer at most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
