Using Psuedolabels for training Sentiment Classifiers makes the model generalize better across datasets
Natesh Reddy, Muktabh Mayank Srivastava

TL;DR
This paper demonstrates that using pseudolabels generated from a small annotated dataset on large unannotated, multi-domain data improves the generalization of sentiment classifiers across different datasets.
Contribution
The work introduces a pseudolabeling approach that enhances cross-domain sentiment classification without extensive annotations from multiple domains.
Findings
Pseudolabel-based training improves cross-dataset performance
Method reduces need for extensive domain-specific annotations
Model generalizes better across diverse sentiment datasets
Abstract
The problem statement addressed in this work is : For a public sentiment classification API, how can we set up a classifier that works well on different types of data, having limited ability to annotate data from across domains. We show that given a large amount of unannotated data from across different domains and pseudolabels on this dataset generated by a classifier trained on a small annotated dataset from one domain, we can train a sentiment classifier that generalizes better across different datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques
