A Robust Hybrid Approach for Textual Document Classification

Muhammad Nabeel Asim; Muhammad Usman Ghani Khan; Muhammad Imran Malik,; Andreas Dengel; Sheraz Ahmed

arXiv:1909.05478·cs.CL·September 13, 2019

A Robust Hybrid Approach for Textual Document Classification

Muhammad Nabeel Asim, Muhammad Usman Ghani Khan, Muhammad Imran Malik,, Andreas Dengel, Sheraz Ahmed

PDF

1 Repo

TL;DR

This paper introduces a hybrid text classification approach combining feature selection and deep learning, significantly improving accuracy over existing methods on standard datasets.

Contribution

It presents a novel two-stage methodology integrating traditional feature selection with deep CNNs for enhanced document classification performance.

Findings

01

Outperforms state-of-the-art methods by 7.7% on 20 Newsgroups

02

Achieves 6.6% higher accuracy on BBC news dataset

03

Demonstrates the effectiveness of hybrid feature engineering in NLP

Abstract

Text document classification is an important task for diverse natural language processing based applications. Traditional machine learning approaches mainly focused on reducing dimensionality of textual data to perform classification. This although improved the overall classification accuracy, the classifiers still faced sparsity problem due to lack of better data representation techniques. Deep learning based text document classification, on the other hand, benefitted greatly from the invention of word embeddings that have solved the sparsity problem and researchers focus mainly remained on the development of deep architectures. Deeper architectures, however, learn some redundant features that limit the performance of deep learning based solutions. In this paper, we propose a two stage text document classification methodology which combines traditional feature engineering with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bharathrajcl/A-Robust-Hybrid-Approach-for-Textual-Document-Classification
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFeature Selection