TL;DR
VRAgent-R1 introduces a novel agent-based framework utilizing multimodal large language models and reinforcement learning to significantly improve video recommendation accuracy and user simulation fidelity.
Contribution
The paper presents VRAgent-R1, a new multimodal agent-based approach with human-like reasoning for enhanced video recommendation and user simulation.
Findings
IP Agent improves NDCG@10 by 6.0% on MicroLens-100k
US Agent achieves 45.0% higher accuracy in user decision simulation
Demonstrates superior performance over state-of-the-art baselines
Abstract
Owing to powerful natural language processing and generative capabilities, large language model (LLM) agents have emerged as a promising solution for enhancing recommendation systems via user simulation. However, in the realm of video recommendation, existing studies predominantly resort to prompt-based simulation using frozen LLMs and encounter the intricate challenge of multimodal content understanding. This frequently results in suboptimal item modeling and user preference learning, thereby ultimately constraining recommendation performance. To address these challenges, we introduce VRAgent-R1, a novel agent-based paradigm that incorporates human-like intelligence in user simulation. Specifically, VRAgent-R1 comprises two distinct agents: the Item Perception (IP) Agent and the User Simulation (US) Agent, designed for interactive user-item modeling. Firstly, the IP Agent emulates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
