Experiences from Creating a Benchmark for Sentiment Classification for Varieties of English
Dipankar Srirag, Jordan Painter, Aditya Joshi, Diptesh Kanojia

TL;DR
This paper discusses the development of a sentiment classification benchmark for different English varieties, highlighting how linguistic diversity affects model performance and emphasizing nuanced benchmark design.
Contribution
It presents insights from creating a sentiment benchmark for Australian, Indian, and British English, emphasizing sampling techniques and linguistic diversity considerations.
Findings
Performance varies significantly across English varieties.
Sampling techniques impact sentiment classification accuracy.
Nuanced benchmark design improves robustness.
Abstract
Existing benchmarks often fail to account for linguistic diversity, like language variants of English. In this paper, we share our experiences from our ongoing project of building a sentiment classification benchmark for three variants of English: Australian (en-AU), Indian (en-IN), and British (en-UK) English. Using Google Places reviews, we explore the effects of various sampling techniques based on label semantics, review length, and sentiment proportion and report performances on three fine-tuned BERT-based models. Our initial evaluation reveals significant performance variations influenced by sample characteristics, label semantics, and language variety, highlighting the need for nuanced benchmark design. We offer actionable insights for researchers to create robust benchmarks, emphasising the importance of diverse sampling, careful label definition, and comprehensive evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLiterature, Language, and Rhetoric Studies · Discourse Analysis and Cultural Communication
