BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Dipankar Srirag; Aditya Joshi; Jordan Painter; Diptesh Kanojia

arXiv:2412.04726·cs.CL·June 18, 2025

BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Dipankar Srirag, Aditya Joshi, Jordan Painter, Diptesh Kanojia

PDF

Open Access 1 Datasets 1 Video

TL;DR

BESSTIE introduces a new benchmark dataset for sentiment and sarcasm classification across Australian, Indian, and British English varieties, addressing bias and generalization challenges in large language models.

Contribution

The paper presents BESSTIE, a novel labeled dataset for sentiment and sarcasm detection in diverse English varieties, along with evaluation of LLM performance and analysis of language variety-specific challenges.

Findings

01

Models perform better on en-AU and en-UK than en-IN.

02

Sarcasm classification is more challenging across varieties.

03

Cross-variety generalization remains a significant challenge.

Abstract

Despite large language models (LLMs) being known to exhibit bias against non-standard language varieties, there are no known labelled datasets for sentiment analysis of English. To address this gap, we introduce BESSTIE, a benchmark for sentiment and sarcasm classification for three varieties of English: Australian (en-AU), Indian (en-IN), and British (en-UK). We collect datasets for these language varieties using two methods: location-based for Google Places reviews, and topic-based filtering for Reddit comments. To assess whether the dataset accurately represents these varieties, we conduct two validation steps: (a) manual annotation of language varieties and (b) automatic language variety prediction. Native speakers of the language varieties manually annotate the datasets with sentiment and sarcasm labels. We perform an additional annotation exercise to validate the reliance of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

surrey-nlp/BESSTIE-CW-26
dataset· 688 dl
688 dl

Videos

BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English· underline

Taxonomy

TopicsAuthorship Attribution and Profiling