ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders
Ofer Meshi, Krisztian Balog, Sally Goldman, Avi Caciularu, Guy Tennenholtz, Jihwan Jeong, Amir Globerson, Craig Boutilier

TL;DR
This paper introduces ConvApparel, a new dataset and validation framework for user simulators in conversational recommenders, addressing the realism gap and improving simulator robustness through counterfactual validation.
Contribution
It presents a novel dataset with dual-agent data collection and a comprehensive validation framework to evaluate and enhance user simulators in conversational AI.
Findings
Data-driven simulators outperform prompted baselines in realism.
Simulators adapt more realistically to unseen behaviors.
Significant realism gap identified across all tested simulators.
Abstract
The promise of LLM-based user simulators to improve conversational AI is hindered by a critical "realism gap," leading to systems that are optimized for simulated interactions, but may fail to perform well in the real world. We introduce ConvApparel, a new dataset of human-AI conversations designed to address this gap. Its unique dual-agent data collection protocol -- using both "good" and "bad" recommenders -- enables counterfactual validation by capturing a wide spectrum of user experiences, enriched with first-person annotations of user satisfaction. We propose a comprehensive validation framework that combines statistical alignment, a human-likeness score, and counterfactual validation to test for generalization. Our experiments reveal a significant realism gap across all simulators. However, the framework also shows that data-driven simulators outperform a prompted baseline,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Mobile Crowdsensing and Crowdsourcing
