Benchmarking of Different YOLO Models for CAPTCHAs Detection and   Classification

Miko{\l}aj Wysocki; Henryk Gierszal; Piotr Tyczka; Sophia Karagiorgou,; George Pantelis

arXiv:2502.13740·cs.CV·February 20, 2025

Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification

Miko{\l}aj Wysocki, Henryk Gierszal, Piotr Tyczka, Sophia Karagiorgou,, George Pantelis

PDF

Open Access

TL;DR

This study benchmarks various YOLO models for webpage CAPTCHA detection, analyzing their accuracy, speed, and adaptability, and proposes an image slicing method to enhance detection on large images.

Contribution

It provides a comprehensive comparison of YOLOv5, YOLOv8, and YOLOv10 models for CAPTCHA detection and introduces an image slicing technique to improve detection on large images.

Findings

01

Nano models are fastest for real-time applications.

02

Complex models achieve higher detection accuracy.

03

Image slicing improves detection metrics on large images.

Abstract

This paper provides an analysis and comparison of the YOLOv5, YOLOv8 and YOLOv10 models for webpage CAPTCHAs detection using the datasets collected from the web and darknet as well as synthetized data of webpages. The study examines the nano (n), small (s), and medium (m) variants of YOLO architectures and use metrics such as Precision, Recall, F1 score, mAP@50 and inference speed to determine the real-life utility. Additionally, the possibility of tuning the trained model to detect new CAPTCHA patterns efficiently was examined as it is a crucial part of real-life applications. The image slicing method was proposed as a way to improve the metrics of detection on oversized input images which can be a common scenario in webpages analysis. Models in version nano achieved the best results in terms of speed, while more complexed architectures scored better in terms of other metrics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUser Authentication and Security Systems · Spam and Phishing Detection · Advanced Malware Detection Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · You Only Look Once