CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision

Pengcheng Wang; Haoxiang Liu; Yang Dai; Xiangxiang Zeng; Guanhua Chen; Baotian Hu; Longyue Wang; Weihua Luo

arXiv:2605.19538·cs.CV·May 20, 2026

CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision

Pengcheng Wang, Haoxiang Liu, Yang Dai, Xiangxiang Zeng, Guanhua Chen, Baotian Hu, Longyue Wang, Weihua Luo

PDF

TL;DR

CaptchaMind is a reinforcement learning-based CAPTCHA solver trained with explicit reasoning supervision, achieving high success rates on a new large-scale benchmark and outperforming existing methods.

Contribution

The paper introduces CaptchaBench, a large-scale CAPTCHA dataset with detailed annotations, and proposes CaptchaMind, a novel RL-based solver with explicit reasoning supervision.

Findings

01

CaptchaMind achieves 82.9% success rate on benchmark tasks.

02

Existing methods fail on tasks requiring fine-grained visual reasoning.

03

CaptchaMind outperforms all existing methods on real-world CAPTCHA instances.

Abstract

CAPTCHAs are widely deployed as human verification mechanisms and frequently block intelligent agents from completing end-to-end automation in real-world web environments. Solving modern CAPTCHAs requires robust multi-step visual reasoning and interaction capabilities, yet training-based approaches have remained absent due to the lack of large-scale training data and process-level annotations. We introduce CaptchaBench, the first CAPTCHA benchmark designed to support large-scale training, comprising 16,000 programmatically generated samples across eight task categories with detailed region and process-level annotations. Systematic evaluation on CaptchaBench reveals that existing methods fail consistently on tasks requiring fine-grained visual detail capture and region-level comparison. We therefore present CaptchaMind, an RL-based solver trained with explicit reasoning process…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.