Dimensionality Reduction for Sentiment Classification: Evolving for the   Most Prominent and Separable Features

Aftab Anjum; Mazharul Islam; Lin Wang

arXiv:2006.04680·cs.IR·June 9, 2020

Dimensionality Reduction for Sentiment Classification: Evolving for the Most Prominent and Separable Features

Aftab Anjum, Mazharul Islam, Lin Wang

PDF

Open Access

TL;DR

This paper introduces a new framework for sentiment classification that employs two novel dimensionality reduction techniques, SentiTPC and SentiTPR, which better preserve prominent features and improve classifier performance.

Contribution

The paper proposes SentiTPC and SentiTPR, two new dimensionality reduction methods that enhance feature selection by considering distribution differences, outperforming existing techniques.

Findings

01

Significantly reduces feature dimensions

02

Improves sentiment classification accuracy

03

Preserves prominent, separable features

Abstract

In sentiment classification, the enormous amount of textual data, its immense dimensionality, and inherent noise make it extremely difficult for machine learning classifiers to extract high-level and complex abstractions. In order to make the data less sparse and more statistically significant, the dimensionality reduction techniques are needed. But in the existing dimensionality reduction techniques, the number of components needs to be set manually which results in loss of the most prominent features, thus reducing the performance of the classifiers. Our prior work, i.e., Term Presence Count (TPC) and Term Presence Ratio (TPR) have proven to be effective techniques as they reject the less separable features. However, the most prominent and separable features might still get removed from the initial feature set despite having higher distributions among positive and negative tagged…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Sentiment Analysis and Opinion Mining