Binarizing Documents by Leveraging both Space and Frequency

Fabio Quattrini; Vittorio Pippi; Silvia Cascianelli; Rita Cucchiara

arXiv:2404.17243·cs.CV·April 29, 2024

Binarizing Documents by Leveraging both Space and Frequency

Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel document binarization method leveraging Fast Fourier Convolutions to effectively model both local and global information, outperforming traditional convolutional models and requiring fewer parameters than Vision Transformers.

Contribution

The work proposes a new binarization approach using Fast Fourier Convolutions to better capture global context with fewer parameters than ViT-based models.

Findings

01

Effective in handling various degradations

02

Outperforms standard convolutional models

03

Requires fewer parameters than Vision Transformers

Abstract

Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradations and acquisition artifacts that can greatly vary throughout the page. Nonetheless, even when dealing with a local patch of the document, taking into account the overall appearance of a wide portion of the page can ease the prediction by enriching it with semantic information on the ink and background conditions. In this respect, approaches able to model both local and global information have been proven suitable for this task. In particular, recent applications of Vision Transformer (ViT)-based models, able to model short and long-range dependencies via the attention mechanism, have demonstrated their superiority over standard Convolution-based models, which instead…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aimagelab/fourbi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer · Linear Layer · Dense Connections