UniRec: Unified Multimodal Encoding for LLM-Based Recommendations
Zijie Lei, Tao Feng, Zhigang Hua, Yan Xie, Guanyu Lin, Shuang Yang, Ge Liu, Jiaxuan You

TL;DR
UniRec introduces a unified multimodal encoding framework for LLM-based recommendations, effectively handling diverse data types and nested user interaction structures, leading to significant performance improvements.
Contribution
It proposes a novel hierarchical encoding architecture with modality-specific encoders and triplet representations to address heterogeneity and nested structures in recommendation data.
Findings
UniRec outperforms state-of-the-art methods by up to 15% on real-world benchmarks.
The triplet representation effectively separates schema from raw inputs, enhancing semantic clarity.
Hierarchical Q-Former captures nested user interaction sequences, improving recommendation accuracy.
Abstract
Large language models have recently shown promise for multimodal recommendation, particularly with text and image inputs. Yet real-world recommendation signals extend far beyond these modalities. To reflect this, we formalize recommendation features into four modalities: text, images, categorical features, and numerical attributes, and highlight the unique challenges this heterogeneity poses for LLMs in understanding multimodal information. In particular, these challenges arise not only across modalities but also within them, as attributes such as price, rating, and time may all be numeric yet carry distinct semantic meanings. Beyond this intra-modality ambiguity, another major challenge is the nested structure of recommendation signals, where user histories are sequences of items, each associated with multiple attributes. To address these challenges, we propose UniRec, a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
