Enhancing Sentiment Analysis Results through Outlier Detection Optimization
Yuetian Chen, Mei Si

TL;DR
This paper explores the use of outlier detection with Deep SVDD to improve sentiment analysis accuracy by removing inconsistent labels, demonstrating benefits across various models and datasets.
Contribution
It introduces a method combining outlier detection with Deep SVDD to enhance sentiment classification results in subjective text datasets.
Findings
Outlier removal improves classification accuracy in most datasets.
Large language models like DeBERTa capture complex patterns effectively.
Outlier detection benefits are consistent across different classifiers.
Abstract
When dealing with text data containing subjective labels like speaker emotions, inaccuracies or discrepancies among labelers are not uncommon. Such discrepancies can significantly affect the performance of machine learning algorithms. This study investigates the potential of identifying and addressing outliers in text data with subjective labels, aiming to enhance classification outcomes. We utilized the Deep SVDD algorithm, a one-class classification method, to detect outliers in nine text-based emotion and sentiment analysis datasets. By employing both a small-sized language model (DistilBERT base model with 66 million parameters) and non-deep learning machine learning algorithms (decision tree, KNN, Logistic Regression, and LDA) as the classifier, our findings suggest that the removal of outliers can lead to enhanced results in most cases. Additionally, as outliers in such datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Anomaly Detection Techniques and Applications · Stock Market Forecasting Methods
MethodsHow do I file a dispute with Expedia?*DisputeFastService · Logistic Regression · DeBERTa · Balanced Selection
