Binarizing Documents by Leveraging both Space and Frequency
Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

TL;DR
This paper introduces a novel document binarization method leveraging Fast Fourier Convolutions to effectively model both local and global information, outperforming traditional convolutional models and requiring fewer parameters than Vision Transformers.
Contribution
The work proposes a new binarization approach using Fast Fourier Convolutions to better capture global context with fewer parameters than ViT-based models.
Findings
Effective in handling various degradations
Outperforms standard convolutional models
Requires fewer parameters than Vision Transformers
Abstract
Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradations and acquisition artifacts that can greatly vary throughout the page. Nonetheless, even when dealing with a local patch of the document, taking into account the overall appearance of a wide portion of the page can ease the prediction by enriching it with semantic information on the ink and background conditions. In this respect, approaches able to model both local and global information have been proven suitable for this task. In particular, recent applications of Vision Transformer (ViT)-based models, able to model short and long-range dependencies via the attention mechanism, have demonstrated their superiority over standard Convolution-based models, which instead…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer · Linear Layer · Dense Connections
