Reasoning Beyond Labels: Measuring LLM Sentiment in Low-Resource, Culturally Nuanced Contexts
Millicent Ochieng, Anja Thieme, Ignatius Ezeani, Risa Ueno, Samuel Maina, Keshet Ronen, Javier Gonzalez, Jacki O'Neill

TL;DR
This paper develops a framework to evaluate how large language models interpret sentiment in culturally nuanced, low-resource contexts, revealing significant variation in reasoning quality and emphasizing the importance of culturally sensitive AI evaluation.
Contribution
It introduces a diagnostic approach that treats sentiment as context-dependent and culturally embedded, and assesses LLM interpretability and robustness in informal, code-mixed communication.
Findings
Top-tier LLMs show interpretive stability in sentiment reasoning.
Open models often struggle with ambiguity and sentiment shifts.
Culturally sensitive evaluation is crucial for real-world NLP applications.
Abstract
Sentiment analysis in low-resource, culturally nuanced contexts challenges conventional NLP approaches that assume fixed labels and universal affective expressions. We present a diagnostic framework that treats sentiment as a context-dependent, culturally embedded construct, and evaluate how large language models (LLMs) reason about sentiment in informal, code-mixed WhatsApp messages from Nairobi youth health groups. Using a combination of human-annotated data, sentiment-flipped counterfactuals, and rubric-based explanation evaluation, we probe LLM interpretability, robustness, and alignment with human reasoning. Framing our evaluation through a social-science measurement lens, we operationalize and interrogate LLMs outputs as an instrument for measuring the abstract concept of sentiment. Our findings reveal significant variation in model reasoning quality, with top-tier LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
