TextShield-R1: Reinforced Reasoning for Tampered Text Detection
Chenfan Qu, Yiwu Zhong, Jian Liu, Xuekang Zhu, Bohan Yu, Lianwen Jin

TL;DR
TextShield-R1 is a reinforcement learning-based multimodal large language model designed for tampered text detection, leveraging a new pre-training curriculum, reward functions, OCR-based localization, and a comprehensive benchmark to improve accuracy and interpretability.
Contribution
It introduces a novel reinforcement learning framework with specialized pre-training and OCR-based refinement for tampered text detection, along with a large-scale, diverse benchmark for evaluation.
Findings
Significant improvement over existing methods in tampered text detection accuracy.
Enhanced localization precision through OCR rectification.
Robust evaluation across multiple languages, tampering techniques, and domains.
Abstract
The growing prevalence of tampered images poses serious security threats, highlighting the urgent need for reliable detection methods. Multimodal large language models (MLLMs) demonstrate strong potential in analyzing tampered images and generating interpretations. However, they still struggle with identifying micro-level artifacts, exhibit low accuracy in localizing tampered text regions, and heavily rely on expensive annotations for forgery interpretation. To this end, we introduce TextShield-R1, the first reinforcement learning based MLLM solution for tampered text detection and reasoning. Specifically, our approach introduces Forensic Continual Pre-training, an easy-to-hard curriculum that well prepares the MLLM for tampered text detection by harnessing the large-scale cheap data from natural image forensic and OCR tasks. During fine-tuning, we perform Group Relative Policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDigital Media Forensic Detection · Handwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis
