CAPTURE: A Benchmark and Evaluation for LVLMs in CAPTCHA Resolving

Jianyi Zhang; Ziyin Zhou; Xu Ji; Shizhao Liu; Zhangchi Zhao

arXiv:2512.11323·cs.AI·December 15, 2025

CAPTURE: A Benchmark and Evaluation for LVLMs in CAPTCHA Resolving

Jianyi Zhang, Ziyin Zhou, Xu Ji, Shizhao Liu, Zhangchi Zhao

PDF

Open Access

TL;DR

This paper introduces CAPTURE, a comprehensive benchmark for evaluating Large Visual Language Models (LVLMs) on various CAPTCHA types, revealing their current limitations in solving CAPTCHA challenges.

Contribution

The paper presents the first dedicated CAPTCHA benchmark for LVLMs, covering diverse types and sub-types, with extensive data and tailored labels for thorough evaluation.

Findings

01

LVLMs perform poorly on CAPTCHA tasks

02

CAPTURE covers 4 main types and 25 sub-types from 31 vendors

03

Benchmark fills gaps in data diversity and labeling

Abstract

Benefiting from strong and efficient multi-modal alignment strategies, Large Visual Language Models (LVLMs) are able to simulate human visual and reasoning capabilities, such as solving CAPTCHAs. However, existing benchmarks based on visual CAPTCHAs still face limitations. Previous studies, when designing benchmarks and datasets, customized them according to their research objectives. Consequently, these benchmarks cannot comprehensively cover all CAPTCHA types. Notably, there is a dearth of dedicated benchmarks for LVLMs. To address this problem, we introduce a novel CAPTCHA benchmark for the first time, named CAPTURE CAPTCHA for Testing Under Real-world Experiments, specifically for LVLMs. Our benchmark encompasses 4 main CAPTCHA types and 25 sub-types from 31 vendors. The diversity enables a multi-dimensional and thorough evaluation of LVLM performance. CAPTURE features extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · User Authentication and Security Systems