WordVIS: A Color Worth A Thousand Words

Umar Khan; Saifullah; Stefan Agne; Andreas Dengel; Sheraz Ahmed

arXiv:2412.10155·cs.CV·December 16, 2024

WordVIS: A Color Worth A Thousand Words

Umar Khan, Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed

PDF

TL;DR

This paper introduces WordVIS, a method that embeds textual features into visual space, enabling lightweight image classifiers to perform well on document classification with small datasets, reducing data and computational needs.

Contribution

The paper presents a novel approach to embed textual features into visual space, improving image-based document classification accuracy on limited data without extensive training.

Findings

01

Achieved 4.64% improvement with ResNet50 without pre-training.

02

Set a new record of 91.14% accuracy on Tobacco-3482 dataset.

03

Demonstrated effectiveness of lightweight classifiers with embedded textual features.

Abstract

Document classification is considered a critical element in automated document processing systems. In recent years multi-modal approaches have become increasingly popular for document classification. Despite their improvements, these approaches are underutilized in the industry due to their requirement for a tremendous volume of training data and extensive computational power. In this paper, we attempt to address these issues by embedding textual features directly into the visual space, allowing lightweight image-based classifiers to achieve state-of-the-art results using small-scale datasets in document classification. To evaluate the efficacy of the visual features generated from our approach on limited data, we tested on the standard dataset Tobacco-3482. Our experiments show a tremendous improvement in image-based classifiers, achieving an improvement of 4.64% using ResNet50 with no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.