A Multilingual Encoding Method for Text Classification and Dialect   Identification Using Convolutional Neural Network

Amr Adel Helmy

arXiv:1903.07588·cs.CL·March 19, 2019·1 cites

A Multilingual Encoding Method for Text Classification and Dialect Identification Using Convolutional Neural Network

Amr Adel Helmy

PDF

Open Access

TL;DR

This paper introduces a novel multilingual text classification approach using new encoding methods 'BUNOW' and 'BUNOC' with a specialized CNN architecture, achieving efficient and accurate results across Arabic and English datasets.

Contribution

It proposes two new encoding techniques and a CNN architecture that are language-independent, memory-efficient, and faster than traditional methods for text classification.

Findings

01

Achieved promising results on Arabic and English datasets.

02

Reduced memory usage compared to traditional encoding methods.

03

Faster computation with fewer network layers.

Abstract

This thesis presents a language-independent text classification model by introduced two new encoding methods "BUNOW" and "BUNOC" used for feeding the raw text data into a new CNN spatial architecture with vertical and horizontal convolutional process instead of commonly used methods like one hot vector or word representation (i.e. word2vec) with temporal CNN architecture. The proposed model can be classified as hybrid word-character model in its work methodology because it consumes less memory space by using a fewer neural network parameters as in character level representation, in addition to providing much faster computations with fewer network layers depth, as in word level representation. A promising result achieved compared to state of art models in two different morphological benchmarked dataset one for Arabic language and one for English language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text and Document Classification Technologies · Speech Recognition and Synthesis