Personalized LLM Decoding via Contrasting Personal Preference
Hyungjune Bu, Chanjoo Jung, Minjae Kang, Jaehyung Kim

TL;DR
This paper introduces CoPe, a decoding-time personalization method for LLMs that enhances user-specific text generation by leveraging reward-guided decoding after PEFT, achieving significant improvements without extra training.
Contribution
The paper presents CoPe, a novel decoding-time personalization approach that maximizes implicit reward signals post-PEFT, offering an effective alternative to prompt-based and training-based methods.
Findings
Achieves 10.57% improvement in ROUGE-L on average.
Does not require external reward models or additional training.
Effective across five open-ended personalized text generation tasks.
Abstract
As large language models (LLMs) are progressively deployed in various real-world applications, personalization of LLMs has become increasingly important. While various approaches to LLM personalization such as prompt-based and training-based methods have been actively explored, the development of effective decoding-time algorithms remains largely overlooked, despite their demonstrated potential. In this paper, we propose CoPe (Contrasting Personal Preference), a novel decoding-time approach applied after performing parameter-efficient fine-tuning (PEFT) on user-specific data. Our core idea is to leverage reward-guided decoding specifically for personalization by maximizing each user's implicit reward signal. We evaluate CoPe across five open-ended personalized text generation tasks. Our empirical results demonstrate that CoPe achieves strong performance, improving personalization by an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
