MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation

Yu Wang; Yonghui Yang; Le Wu; Jiancan Wu; Hefei Xu; Hui Lin

arXiv:2603.06243·cs.IR·March 9, 2026

MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation

Yu Wang, Yonghui Yang, Le Wu, Jiancan Wu, Hefei Xu, Hui Lin

PDF

Open Access

TL;DR

This paper introduces MLLMRec-R1, a novel framework that enhances multimodal sequential recommendation by efficiently leveraging reasoning capabilities of large language models, addressing computational and supervision challenges.

Contribution

MLLMRec-R1 proposes a stable, cost-effective GRPO-based reasoning pipeline that textualizes visual signals and refines supervision for multimodal recommendation tasks.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets

02

Effectively mitigates reward inflation issues

03

Demonstrates stable and scalable training process

Abstract

Group relative policy optimization (GRPO) has become a standard post-training paradigm for improving reasoning and preference alignment in large language models (LLMs), and has recently shown strong effectiveness in LLM-based recommender systems. However, extending GRPO-based reasoning pipelines to multimodal sequential recommendation (MSR) with multimodal large language models (MLLMs) faces fundamental obstacles. First, MSR requires jointly encoding visual content for both historical interactions and multiple candidate items, causing visual tokens to dominate the input and making the cost of group-based rollout scale with history length and candidate set size, which renders GRPO-based training prohibitively expensive. Second, existing Chain-of-Thought (CoT) supervision suffers from reward inflation in recommendation scenarios, where higher training rewards do not reliably translate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Recommender Systems and Techniques