Exploring Recommender System Evaluation: A Multi-Modal User Agent Framework for A/B Testing

Wenlin Zhang; Xiangyang Li; Qiyuan Ge; Kuicai Dong; Pengyue Jia; Xiaopeng Li; Zijian Zhang; Maolin Wang; Yichao Wang; Huifeng Guo; Ruiming Tang; Xiangyu Zhao

arXiv:2601.04554·cs.IR·January 9, 2026

Exploring Recommender System Evaluation: A Multi-Modal User Agent Framework for A/B Testing

Wenlin Zhang, Xiangyang Li, Qiyuan Ge, Kuicai Dong, Pengyue Jia, Xiaopeng Li, Zijian Zhang, Maolin Wang, Yichao Wang, Huifeng Guo, Ruiming Tang, Xiangyu Zhao

PDF

Open Access

TL;DR

This paper introduces a multi-modal user agent framework for A/B testing in recommender systems, aiming to reduce costs and improve simulation fidelity by mimicking real user interactions within a constructed sandbox environment.

Contribution

It presents a novel multi-modal user agent and recommendation sandbox environment that simulate complex user behaviors for more effective A/B testing in recommender systems.

Findings

01

The agent can effectively simulate human decision-making processes.

02

Generated data enhances recommendation model performance.

03

The framework reduces costs and improves testing realism.

Abstract

In recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models' powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Recommender Systems and Techniques · Intelligent Tutoring Systems and Adaptive Learning