BenCSSmark: Making the Social Sciences Count in LLM Research

Arnault Chatelain; \'Etienne Ollion; Qianwen Guan; Diandra Fabre; Lorraine Goeuriot; Emile Chapuis; Abdelkrim Beloued; Marie Candito; Nicolas Herv\'e; Didier Schwab

arXiv:2605.04886·cs.CL·May 7, 2026

BenCSSmark: Making the Social Sciences Count in LLM Research

Arnault Chatelain, \'Etienne Ollion, Qianwen Guan, Diandra Fabre, Lorraine Goeuriot, Emile Chapuis, Abdelkrim Beloued, Marie Candito, Nicolas Herv\'e, Didier Schwab

PDF

TL;DR

This paper advocates for integrating social science datasets into LLM benchmarks to enhance AI evaluation, robustness, and social relevance, introducing the BenCSSmark benchmark composed of social science datasets.

Contribution

It introduces BenCSSmark, a benchmark incorporating social science datasets annotated by social scientists to improve AI evaluation and social scientific inquiry.

Findings

01

BenCSSmark includes datasets from social sciences.

02

Integrating social science data can improve AI model robustness.

03

Benchmark promotes socially relevant AI development.

Abstract

This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing computational systems -- are pivotal in the development of artificial intelligence (AI), including large language models (LLMs). Benchmarks do more than measure progress -- they actively structure it, shaping reputations, research agendas, and commercial outcomes. Despite this central role, the social sciences are largely absent from mainstream evaluation frameworks, even though scholars in these fields generate dozens of rigorously annotated, context-sensitive datasets each year. Integrating this work into benchmark design could significantly improve the generalization and robustness of AI models. In turn, models trained on social scientific tasks would likely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.