PAD: Personalized Alignment of LLMs at Decoding-Time
Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, and Zuozhu Liu

TL;DR
PAD introduces a decoding-time framework for aligning large language models with personalized preferences without additional training, enabling real-time, scalable, and preference-specific text generation.
Contribution
This paper presents a novel decoding-time personalized alignment method that decouples preference modeling from training, allowing real-time, scalable, and generalizable alignment of LLMs.
Findings
Outperforms training-based alignment methods in preference alignment accuracy.
Demonstrates strong generalization to unseen preferences.
Scales effectively across different base models.
Abstract
Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods. In response, this paper presents Personalized Alignment at Decoding-time (PAD), a novel framework designed to align LLM outputs with diverse personalized preferences during the inference phase, eliminating the need for additional training. By introducing a unique personalized reward modeling strategy, this framework decouples the text generation process from personalized preferences, facilitating the generation of generalizable token-level personalized rewards. The PAD algorithm leverages these rewards to guide the decoding process, dynamically tailoring the base model's predictions to personalized preferences. Extensive experimental results demonstrate…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper discusses that PAD requires only a single policy model aligned with general preferences, eliminating additional training. This means that the algorithm operates efficiently without the need for the creation of multiple specialized models. The contrast with previous works in these aspects is detailed very well in Table 1. 2. Great presentation of the theoretical aspects of the algorithm. The experiment section a bit lacking in details though. Yet the analysis of different baselines
1. Not much details are shared on how the model can generalize to unseen preferences not seen during the training. The w_p is dependent on p being specified so new preferences would need some retraining. 2. While the experiment section compares the performance of the model against the baselines, the latency and cost is not compared which will have practical implication in real world use.
- **Theoretical and empirical validation**: PAD combines theoretical innovation with empirical evidence, reinforcing its effectiveness. Experimental results align with theoretical assumptions, showcasing PAD’s strengths in accommodating personalized preferences. - **Superior performance over existing personalized alignment methods**: Compared to other approaches, PAD performs better across multiple metrics and significantly reduces computational costs, making it more feasible for practical appli
- **Unclear description of training process**: While the paper focuses on theoretical derivations, the description of the actual training process lacks clarity, omitting key implementation details. Adding specific explanations of the training process would improve reader comprehension. - **Inconsistent notation and terminology**: - The definition of variable $a$ in line 287 is unclear, and the inconsistent use of subscript $t$ could lead to confusion. - The term PRM is commonly understood to
1. The paper is well-written and easy to follow. The authors build up the concept of PAD clearly. 2. Decoupling personalized preferences from the Markov Decision Process and the idea of the generation of generalizable token-level personalized reward is unique. 3. It does not need to further train the policy nor need to train multiple reward models unlike many existing works.
1. The paper lacks a Pareto front analysis for scenarios involving multiple rewards. For experiments that combine multiple objectives (like "Harmless and Humor" or "Expert, Informative, and Creative" in Figure 3), including such an analysis would help illustrate the trade-offs between different preferences. 2. Table 3 only displays PAD's performance. Including results from other baseline methods across different models would be nice. 3. During inference, the need to load and run an additional LL
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsBalanced Selection · ALIGN
