TL;DR
This paper critically evaluates the production readiness of aspect-based sentiment analysis models, highlighting issues in reproducibility, transferability, and providing new datasets to support future research.
Contribution
It offers a comprehensive empirical assessment of existing models, reveals reproducibility issues, and introduces large-scale annotated corpora for further study.
Findings
Average 4-5% drop in test accuracy across models
12-55% accuracy drop on challenging data slices
Transfer learning with 10-25% domain data improves performance
Abstract
With the exponential growth of online marketplaces and user-generated content therein, aspect-based sentiment analysis has become more important than ever. In this work, we critically review a representative sample of the models published during the past six years through the lens of a practitioner, with an eye towards deployment in production. First, our rigorous empirical evaluation reveals poor reproducibility: an average 4-5% drop in test accuracy across the sample. Second, to further bolster our confidence in empirical evaluation, we report experiments on two challenging data slices, and observe a consistent 12-55% drop in accuracy. Third, we study the possibility of transfer across domains and observe that as little as 10-25% of the domain-specific training dataset, when used in conjunction with datasets from other domains within the same locale, largely closes the gap between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
