Beyond Offline A/B Testing: Context-Aware Agent Simulation for Recommender System Evaluation

Nicolas Bougie; Gian Maria Marconi; Xiaotong Ye; and Narimasa Watanabe

arXiv:2604.09549·cs.IR·April 14, 2026

Beyond Offline A/B Testing: Context-Aware Agent Simulation for Recommender System Evaluation

Nicolas Bougie, Gian Maria Marconi, Xiaotong Ye, and Narimasa Watanabe

PDF

1 Datasets

TL;DR

This paper introduces ContextSim, a framework using large language models to simulate realistic user behavior in recommender system evaluation by incorporating contextual daily life activities.

Contribution

It presents a novel LLM-based agent framework that models user context and internal thoughts, improving the realism of user simulations for recommender system testing.

Findings

01

ContextSim generates user interactions more aligned with human behavior.

02

Optimizing RS parameters with ContextSim improves real-world engagement.

03

The approach enhances offline evaluation correlation with online performance.

Abstract

Recommender systems are central to online services, enabling users to navigate through massive amounts of content across various domains. However, their evaluation remains challenging due to the disconnect between offline metrics and online performance. The emergence of Large Language Model-powered agents offers a promising solution, yet existing studies model users in isolation, neglecting the contextual factors such as time, location, and needs, which fundamentally shape human decision-making. In this paper, we introduce ContextSim, an LLM agent framework that simulates believable user proxies by anchoring interactions in daily life activities. Namely, a life simulation module generates scenarios specifying when, where, and why users engage with recommendations. To align preferences with genuine humans, we model agents' internal thoughts and enforce consistency at both the action and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

molmohsen/awesome-ai-agent-papers
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.