Text Compression for Sentiment Analysis via Evolutionary Algorithms
Emmanuel Dufourq, Bruce A. Bassett

TL;DR
This paper introduces PARSEC, an evolutionary algorithm that compresses text using Parts-of-Speech tags, achieving minimal sentiment classification accuracy loss across multiple datasets and algorithms.
Contribution
The study presents a novel POS-based text compression method that maintains sentiment analysis accuracy, demonstrating significant compression with minimal accuracy loss.
Findings
Achieves up to 75% data compression with less than 3.3% accuracy loss using LingPipe.
Other sentiment algorithms experience greater accuracy degradation under compression.
Compression effectiveness varies depending on the sentiment analysis algorithm used.
Abstract
Can textual data be compressed intelligently without losing accuracy in evaluating sentiment? In this study, we propose a novel evolutionary compression algorithm, PARSEC (PARts-of-Speech for sEntiment Compression), which makes use of Parts-of-Speech tags to compress text in a way that sacrifices minimal classification accuracy when used in conjunction with sentiment analysis algorithms. An analysis of PARSEC with eight commercial and non-commercial sentiment analysis algorithms on twelve English sentiment data sets reveals that accurate compression is possible with (0%, 1.3%, 3.3%) loss in sentiment classification accuracy for (20%, 50%, 75%) data compression with PARSEC using LingPipe, the most accurate of the sentiment algorithms. Other sentiment analysis algorithms are more severely affected by compression. We conclude that significant compression of text data is possible for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
