Using Psuedolabels for training Sentiment Classifiers makes the model   generalize better across datasets

Natesh Reddy; Muktabh Mayank Srivastava

arXiv:2110.02200·cs.CL·October 6, 2021

Using Psuedolabels for training Sentiment Classifiers makes the model generalize better across datasets

Natesh Reddy, Muktabh Mayank Srivastava

PDF

Open Access

TL;DR

This paper demonstrates that using pseudolabels generated from a small annotated dataset on large unannotated, multi-domain data improves the generalization of sentiment classifiers across different datasets.

Contribution

The work introduces a pseudolabeling approach that enhances cross-domain sentiment classification without extensive annotations from multiple domains.

Findings

01

Pseudolabel-based training improves cross-dataset performance

02

Method reduces need for extensive domain-specific annotations

03

Model generalizes better across diverse sentiment datasets

Abstract

The problem statement addressed in this work is : For a public sentiment classification API, how can we set up a classifier that works well on different types of data, having limited ability to annotate data from across domains. We show that given a large amount of unannotated data from across different domains and pseudolabels on this dataset generated by a classifier trained on a small annotated dataset from one domain, we can train a sentiment classifier that generalizes better across different datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques