A Comparative Study on using Principle Component Analysis with Different   Text Classifiers

Ahmed I. Taloba; D. A. Eisa; Safaa S. I. Ismail

arXiv:1807.03283·cs.IR·July 10, 2018

A Comparative Study on using Principle Component Analysis with Different Text Classifiers

Ahmed I. Taloba, D. A. Eisa, Safaa S. I. Ismail

PDF

TL;DR

This study evaluates the impact of applying Principal Component Analysis (PCA) for feature extraction on various text classifiers, demonstrating improved classification accuracy across multiple datasets.

Contribution

It provides an empirical comparison of PCA-based feature reduction with different classifiers in text categorization tasks, highlighting performance improvements.

Findings

01

PCA improves classifier accuracy on most datasets

02

Feature reduction reduces overfitting and noise

03

Performance gains are consistent across multiple classifiers

Abstract

Text categorization (TC) is the task of automatically organizing a set of documents into a set of pre-defined categories. Over the last few years, increased attention has been paid to the use of documents in digital form and this makes text categorization becomes a challenging issue. The most significant problem of text categorization is its huge number of features. Most of these features are redundant, noisy and irrelevant that cause over fitting with most of the classifiers. Hence, feature extraction is an important step to improve the overall accuracy and the performance of the text classifiers. In this paper, we will provide an overview of using principle component analysis (PCA) as a feature extraction with various classifiers. It was observed that the performance rate of the classifiers after using PCA to reduce the dimension of data improved. Experiments are conducted on three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPrincipal Components Analysis