VIPER Strike: Defeating Visual Reasoning CAPTCHAs via Structured Vision-Language Inference
Minfeng Qi, Dongyang He, Qin Wang, Lefeng Zhang

TL;DR
This paper introduces ViPer, a unified framework combining visual perception and language reasoning to effectively solve visual reasoning CAPTCHAs, achieving near-human success rates and outperforming prior methods.
Contribution
ViPer integrates structured visual perception with adaptive language reasoning, providing a general, robust attack framework for diverse visual reasoning CAPTCHAs.
Findings
ViPer achieves up to 93.2% success rate on multiple VRCs.
ViPer outperforms prior solvers like GraphNet, Oedipus, and Holistic approach.
Template-Space Randomization reduces solver effectiveness.
Abstract
Visual Reasoning CAPTCHAs (VRCs) combine visual scenes with natural-language queries that demand compositional inference over objects, attributes, and spatial relations. They are increasingly deployed as a primary defense against automated bots. Existing solvers fall into two paradigms: vision-centric, which rely on template-specific detectors but fail on novel layouts, and reasoning-centric, which leverage LLMs but struggle with fine-grained visual perception. Both lack the generality needed to handle heterogeneous VRC deployments. We present ViPer, a unified attack framework that integrates structured multi-object visual perception with adaptive LLM-based reasoning. ViPer parses visual layouts, grounds attributes to question semantics, and infers target coordinates within a modular pipeline. Evaluated on six major VRC providers (VTT, Geetest, NetEase, Dingxiang, Shumei, Xiaodun),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · User Authentication and Security Systems
