VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning

Siran Chen; Boyu Chen; Chenyun Yu; Yuxiao Luo; Ouyang Yi; Lei Cheng; Chengxiang Zhuo; Zang Li; Yali Wang

arXiv:2507.02626·cs.MM·July 4, 2025

VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning

Siran Chen, Boyu Chen, Chenyun Yu, Yuxiao Luo, Ouyang Yi, Lei Cheng, Chengxiang Zhuo, Zang Li, Yali Wang

PDF

1 Video

TL;DR

VRAgent-R1 introduces a novel agent-based framework utilizing multimodal large language models and reinforcement learning to significantly improve video recommendation accuracy and user simulation fidelity.

Contribution

The paper presents VRAgent-R1, a new multimodal agent-based approach with human-like reasoning for enhanced video recommendation and user simulation.

Findings

01

IP Agent improves NDCG@10 by 6.0% on MicroLens-100k

02

US Agent achieves 45.0% higher accuracy in user decision simulation

03

Demonstrates superior performance over state-of-the-art baselines

Abstract

Owing to powerful natural language processing and generative capabilities, large language model (LLM) agents have emerged as a promising solution for enhancing recommendation systems via user simulation. However, in the realm of video recommendation, existing studies predominantly resort to prompt-based simulation using frozen LLMs and encounter the intricate challenge of multimodal content understanding. This frequently results in suboptimal item modeling and user preference learning, thereby ultimately constraining recommendation performance. To address these challenges, we introduce VRAgent-R1, a novel agent-based paradigm that incorporates human-like intelligence in user simulation. Specifically, VRAgent-R1 comprises two distinct agents: the Item Perception (IP) Agent and the User Simulation (US) Agent, designed for interactive user-item modeling. Firstly, the IP Agent emulates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning· underline