Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

Tianyu Wang; Jiajun Li; Jianghao Lin

arXiv:2605.17079·cs.CL·May 19, 2026

Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

Tianyu Wang, Jiajun Li, Jianghao Lin

PDF

TL;DR

This paper introduces ConsumerSimBench, a benchmark for evaluating LLMs' ability to reconstruct real consumer reactions from Chinese social media, revealing significant gaps between model performance and actual consumer intuition.

Contribution

The paper presents a new benchmark built from real social media data, with a decomposed, auditable evaluation method, highlighting the limitations of current LLMs in consumer reaction prediction.

Findings

01

Strongest model covers only 47.8% of real reaction criteria.

02

GPT-5.2 and Claude-4.6 perform poorly despite benchmark strength.

03

Structured reasoning prompts decrease coverage, multi-agent pipelines improve performance.

Abstract

LLMs are increasingly used as ``digital consumers'' to simulate public opinion, pre-test marketing decisions, and anticipate audience response. However, existing evaluations rarely ask whether a model can reconstruct the concrete reaction patterns that real consumers surface in public discourse. We introduce ConsumerSimBench, a benchmark built from 1,553 real Chinese social-media topics and 23,122 atomic, rule-audited criteria spanning four reaction families. Rather than scoring open-ended generations with a holistic preference judge, ConsumerSimBench decomposes each task into auditable yes-no decisions over concrete reaction points, raising three-judge agreement from 65.8% to 92.1% with 98.4% agreement between pointwise judge decisions and human-majority labels. Across 13 frontier generators, the strongest model, Gemini-3.1-Pro, covers only 47.8% of real reaction criteria, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.