UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

Nolwenn Bernard; Krisztian Balog

arXiv:2512.04588·cs.IR·March 18, 2026

UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

Nolwenn Bernard, Krisztian Balog

PDF

Open Access

TL;DR

UserSimCRS v2 enhances simulation-based evaluation for conversational recommender systems by integrating advanced user simulators, large language models, and new evaluation tools, facilitating more comprehensive and realistic assessments.

Contribution

The paper introduces UserSimCRS v2, a major upgrade with improved simulators, LLM-based evaluation, and broader dataset support for better CRS evaluation.

Findings

01

Enhanced user simulators improve evaluation realism

02

LLM-based judges provide more accurate assessments

03

Broader dataset integration enables diverse testing scenarios

Abstract

Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Topic Modeling · Speech and dialogue systems