An Innovative Word Encoding Method For Text Classification Using   Convolutional Neural Network

Amr Adel Helmy; Yasser M.K. Omar; Rania Hodhod

arXiv:1903.04146·cs.CL·March 12, 2019

An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network

Amr Adel Helmy, Yasser M.K. Omar, Rania Hodhod

PDF

TL;DR

This paper introduces BUNOW, a novel language-independent word encoding method for text classification with CNNs, reducing parameters and memory while improving accuracy over existing character-based approaches.

Contribution

The paper proposes BUNOW, a new binary-based word encoding technique that enhances CNN-based text classification by being language-independent and more efficient.

Findings

01

Achieved 91.99% accuracy on AG's News dataset.

02

Reduced neural network parameters by 34%.

03

Decreased memory consumption by 62%.

Abstract

Text classification plays a vital role today especially with the intensive use of social networking media. Recently, different architectures of convolutional neural networks have been used for text classification in which one-hot vector, and word embedding methods are commonly used. This paper presents a new language independent word encoding method for text classification. The proposed model converts raw text data to low-level feature dimension with minimal or no preprocessing steps by using a new approach called binary unique number of word "BUNOW". BUNOW allows each unique word to have an integer ID in a dictionary that is represented as a k-dimensional vector of its binary equivalent. The output vector of this encoding is fed into a convolutional neural network (CNN) model for classification. Moreover, the proposed model reduces the neural network parameters, allows faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.