Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis
Ruiyang Qin, Jun Xia, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Peipei, Zhou, Jingtong Hu, Yiyu Shi

TL;DR
This paper introduces a novel on-device LLM personalization framework that uses self-supervised data selection and synthesis to enable privacy-preserving, efficient, and effective user-specific model fine-tuning on edge devices.
Contribution
It presents the first framework for on-device LLM personalization that addresses data privacy, limited storage, and sparse annotations through self-supervised data selection and synthetic data generation.
Findings
Achieves higher user-specific content accuracy.
Improves fine-tuning speed and efficiency.
Outperforms baseline methods in personalization quality.
Abstract
After a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. However, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. While it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. In addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. It remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. In this paper, we propose a novel framework to select and store the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Speech and dialogue systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
