SubData: Bridging Heterogeneous Datasets to Enable Theory-Driven Evaluation of Political and Demographic Perspectives in LLMs
Pietro Bernardelle, Leon Fr\"ohling, Stefano Civelli, Gianluca Demartini

TL;DR
This paper introduces SubData, a Python library for standardizing diverse datasets to evaluate how well large language models align with various human perspectives, especially political and demographic viewpoints.
Contribution
It presents a novel framework combining a dataset standardization tool with a theory-driven evaluation approach for assessing LLMs' perspective alignment.
Findings
SubData enables flexible dataset mapping for diverse research needs
The framework allows testing LLMs' classification of content targeting specific demographics
Initial application demonstrates its effectiveness in evaluating perspective alignment
Abstract
As increasingly capable large language models (LLMs) emerge, researchers have begun exploring their potential for subjective tasks. While recent work demonstrates that LLMs can be aligned with diverse human perspectives, evaluating this alignment on downstream tasks (e.g., hate speech detection) remains challenging due to the use of inconsistent datasets across studies. To address this issue, in this resource paper we propose a two-step framework: we (1) introduce SubData, an open-source Python library designed for standardizing heterogeneous datasets to evaluate LLMs perspective alignment; and (2) present a theory-driven approach leveraging this library to test how differently-aligned LLMs (e.g., aligned with different political viewpoints) classify content targeting specific demographics. SubData's flexible mapping and taxonomy enable customization for diverse research needs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsLib · ALIGN
