A Comparative Study on using Principle Component Analysis with Different Text Classifiers
Ahmed I. Taloba, D. A. Eisa, Safaa S. I. Ismail

TL;DR
This study evaluates the impact of applying Principal Component Analysis (PCA) for feature extraction on various text classifiers, demonstrating improved classification accuracy across multiple datasets.
Contribution
It provides an empirical comparison of PCA-based feature reduction with different classifiers in text categorization tasks, highlighting performance improvements.
Findings
PCA improves classifier accuracy on most datasets
Feature reduction reduces overfitting and noise
Performance gains are consistent across multiple classifiers
Abstract
Text categorization (TC) is the task of automatically organizing a set of documents into a set of pre-defined categories. Over the last few years, increased attention has been paid to the use of documents in digital form and this makes text categorization becomes a challenging issue. The most significant problem of text categorization is its huge number of features. Most of these features are redundant, noisy and irrelevant that cause over fitting with most of the classifiers. Hence, feature extraction is an important step to improve the overall accuracy and the performance of the text classifiers. In this paper, we will provide an overview of using principle component analysis (PCA) as a feature extraction with various classifiers. It was observed that the performance rate of the classifiers after using PCA to reduce the dimension of data improved. Experiments are conducted on three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPrincipal Components Analysis
