ShaDocFormer: A Shadow-Attentive Threshold Detector With Cascaded Fusion   Refiner for Document Shadow Removal

Weiwen Chen; Yingtie Lei; Shenghong Luo; Ziyang Zhou; Mingxian Li,; Chi-Man Pun

arXiv:2309.06670·cs.CV·March 22, 2024

ShaDocFormer: A Shadow-Attentive Threshold Detector With Cascaded Fusion Refiner for Document Shadow Removal

Weiwen Chen, Yingtie Lei, Shenghong Luo, Ziyang Zhou, Mingxian Li,, Chi-Man Pun

PDF

Open Access 1 Repo

TL;DR

ShaDocFormer is a Transformer-based model that effectively detects and removes shadows from document images captured by mobile devices, improving readability through a novel shadow detection and image restoration approach.

Contribution

It introduces ShaDocFormer, combining a shadow-attentive threshold detector and a cascaded fusion refiner for accurate shadow mask detection and image restoration.

Findings

01

Outperforms state-of-the-art methods in shadow removal quality

02

Achieves higher accuracy in shadow mask detection

03

Demonstrates effective handling of illumination variations

Abstract

Document shadow is a common issue that arises when capturing documents using mobile devices, which significantly impacts readability. Current methods encounter various challenges, including inaccurate detection of shadow masks and estimation of illumination. In this paper, we propose ShaDocFormer, a Transformer-based architecture that integrates traditional methodologies and deep learning techniques to tackle the problem of document shadow removal. The ShaDocFormer architecture comprises two components: the Shadow-attentive Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module employs a traditional thresholding technique and leverages the attention mechanism of the Transformer to gather global information, thereby enabling precise detection of shadow masks. The cascaded and aggregative structure of the CFR module facilitates a coarse-to-fine restoration process…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kilito777/ShaDocFormer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Residual Connection · Adam · Spatial-Channel Token Distillation · Byte Pair Encoding · Softmax