An item is worth one token in Multimodal Large Language Models-based Sequential Recommendation

Qiyong Zhong; Jiajie Su; Ming Yang; Yunshan Ma; Xiaolin Zheng; Chaochao Chen

arXiv:2511.05885·cs.IR·February 9, 2026

An item is worth one token in Multimodal Large Language Models-based Sequential Recommendation

Qiyong Zhong, Jiajie Su, Ming Yang, Yunshan Ma, Xiaolin Zheng, Chaochao Chen

PDF

Open Access

TL;DR

This paper introduces Speeder, a novel paradigm for multimodal sequential recommendation that enhances efficiency and effectiveness by compressing item representations, progressively optimizing modalities, and improving sequential dependency modeling.

Contribution

Speeder presents a new multimodal recommendation framework with three key innovations to address inefficiencies and biases in existing LLM-based SR methods.

Findings

01

Speeder increases training speed by 250%.

02

Speeder reduces inference time to 25%.

03

Effective in real-world datasets.

Abstract

Sequential recommendations (SR) predict users' future interactions based on their historical behavior. The rise of Large Language Models (LLMs) has brought powerful generative and reasoning capabilities, significantly enhancing SR performance, while Multimodal LLMs (MLLMs) further extend this by introducing data like images and interactive relationships. However, critical issues remain, i.e., (a) Suboptimal item representations caused by lengthy and redundant descriptions, leading to inefficiencies in both training and inference; (b) Modality-related cognitive bias, as LLMs are predominantly pretrained on textual data, limiting their ability to effectively integrate and utilize non-textual modalities; (c) Weakening sequential perception in long interaction sequences, where attention mechanisms struggle to capture earlier interactions, hindering the modeling of long-range dependencies.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Topic Modeling · Multimodal Machine Learning Applications