A Multilingual Encoding Method for Text Classification and Dialect Identification Using Convolutional Neural Network
Amr Adel Helmy

TL;DR
This paper introduces a novel multilingual text classification approach using new encoding methods 'BUNOW' and 'BUNOC' with a specialized CNN architecture, achieving efficient and accurate results across Arabic and English datasets.
Contribution
It proposes two new encoding techniques and a CNN architecture that are language-independent, memory-efficient, and faster than traditional methods for text classification.
Findings
Achieved promising results on Arabic and English datasets.
Reduced memory usage compared to traditional encoding methods.
Faster computation with fewer network layers.
Abstract
This thesis presents a language-independent text classification model by introduced two new encoding methods "BUNOW" and "BUNOC" used for feeding the raw text data into a new CNN spatial architecture with vertical and horizontal convolutional process instead of commonly used methods like one hot vector or word representation (i.e. word2vec) with temporal CNN architecture. The proposed model can be classified as hybrid word-character model in its work methodology because it consumes less memory space by using a fewer neural network parameters as in character level representation, in addition to providing much faster computations with fewer network layers depth, as in word level representation. A promising result achieved compared to state of art models in two different morphological benchmarked dataset one for Arabic language and one for English language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text and Document Classification Technologies · Speech Recognition and Synthesis
