Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations
Yunjia Xi, Menghui Zhu, Jianghao Lin, Bo Chen, Ruiming Tang, Yong Yu, Weinan Zhang

TL;DR
This paper introduces MARC, a modular approach to compress LLM representations for recommendation systems, addressing the mid-layer advantage phenomenon and improving efficiency and effectiveness.
Contribution
MARC explicitly controls LLM modularity through adjustment and task decoupling modules, optimizing representation compression for recommendation tasks.
Findings
MARC addresses the mid-layer representation advantage phenomenon.
MARC achieves a 2.82% eCPM lift in a large-scale online test.
Extensive experiments validate MARC's effectiveness in producing efficient representations.
Abstract
Recently, large language models (LLMs) have advanced recommendation systems (RSs), and recent works have begun to explore how to integrate LLMs into industrial RSs. While most approaches deploy LLMs offline to generate and pre-cache augmented representations for RSs, high-dimensional representations from LLMs introduce substantial storage and computational costs. Thus, it is crucial to compress LLM representations effectively. However, we identify a counterintuitive phenomenon during representation compression: Mid-layer Representation Advantage (MRA), where representations from middle layers of LLMs outperform those from final layers in recommendation tasks. This degraded final layer renders existing compression methods, which typically compress on the final layer, suboptimal. We interpret this based on modularity theory that LLMs develop spontaneous internal functional modularity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
