On-Device Large Language Models for Sequential Recommendation
Xin Xia, Hongzhi Yin, Shane Culpepper

TL;DR
This paper introduces OD-LLM, a task-adaptive compression framework that enables efficient, on-device deployment of large language models for sequential recommendation without sacrificing accuracy.
Contribution
The paper presents a novel compression framework combining low-rank SVD and tokenization normalization, along with a progressive alignment algorithm for on-device LLM deployment.
Findings
No loss in recommendation effectiveness at 50% model size reduction.
OD-LLM significantly reduces memory and computational requirements.
Scalable and practical for real-time on-device recommendation systems.
Abstract
On-device recommendation is critical for a number of real-world applications, especially in scenarios that have agreements on execution latency, user privacy, and robust functionality when internet connectivity is unstable or even impossible. While large language models (LLMs) can now provide exceptional capabilities that model user behavior for sequential recommendation tasks, their substantial memory footprint and computational overhead make the deployment on resource-constrained devices a high risk proposition. In this paper, we propose OD-LLM, the first task-adaptive compression framework explicitly designed to provide efficient and accurate on-device deployment of LLMs for sequential recommendation tasks. OD-LLM uniquely integrates two complementary compression strategies: a low-rank structural compression algorithm which uses Singular Value Decomposition (SVD) to significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Big Data and Digital Economy · Mobile Crowdsensing and Crowdsourcing
