Seeing Colors: Learning Semantic Text Encoding for Classification

Shah Nawaz; Alessandro Calefati; Muhammad Kamran Janjua; Ignazio Gallo

arXiv:1808.10822·cs.CV·September 3, 2018·1 cites

Seeing Colors: Learning Semantic Text Encoding for Classification

Shah Nawaz, Alessandro Calefati, Muhammad Kamran Janjua, Ignazio Gallo

PDF

Open Access

TL;DR

This paper introduces a novel method that converts text documents into images using word embeddings, enabling the use of CNNs for text classification and achieving promising results on benchmark datasets.

Contribution

The work presents a new approach to text classification by encoding text as images, allowing the application of advanced CNN architectures from computer vision to NLP tasks.

Findings

01

Successful conversion of text to images for classification

02

Promising results on benchmark datasets

03

Unified feature space for text and image representations

Abstract

The question we answer with this work is: can we convert a text document into an image to exploit best image classification models to classify documents? To answer this question we present a novel text classification method which converts a text document into an encoded image, using word embedding and capabilities of Convolutional Neural Networks (CNNs), successfully employed in image classification. We evaluate our approach by obtaining promising results on some well-known benchmark datasets for text classification. This work allows the application of many of the advanced CNN architectures developed for Computer Vision to Natural Language Processing. We test the proposed approach on a multi-modal dataset, proving that it is possible to use a single deep model to represent text and image in the same feature space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies · Multimodal Machine Learning Applications