Sim4IA-Bench: A User Simulation Benchmark Suite for Next Query and Utterance Prediction
Andreas Konstantin Kruff, Christin Katharina Kreutz, Timo Breuer, Philipp Schaer, Krisztian Balog

TL;DR
Sim4IA-Bench is a pioneering benchmark suite for evaluating user simulation models in search, linking real search sessions with predicted queries, and introducing new evaluation measures to improve realism and reproducibility.
Contribution
It introduces the first publicly available benchmark linking real search sessions with simulated next-query predictions and proposes a new measure for evaluation.
Findings
Provides 160 real-world search sessions dataset
Includes up to 62 simulation runs per session
Introduces a new evaluation measure for next-query prediction
Abstract
Validating user simulation is a difficult task due to the lack of established measures and benchmarks, which makes it challenging to assess whether a simulator accurately reflects real user behavior. As part of the Sim4IA Micro-Shared Task at the Sim4IA Workshop, SIGIR 2025, we present Sim4IA-Bench, a simulation benchmark suit for the prediction of the next queries and utterances, the first of its kind in the IR community. Our dataset as part of the suite comprises 160 real-world search sessions from the CORE search engine. For 70 of these sessions, up to 62 simulator runs are available, divided into Task A and Task B, in which different approaches predicted users next search queries or utterances. Sim4IA-Bench provides a basis for evaluating and comparing user simulation approaches and for developing new measures of simulator validity. Although modest in size, the suite represents the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Recommender Systems and Techniques · Personal Information Management and User Behavior
