PAD: Personalized Alignment of LLMs at Decoding-Time

Ruizhe Chen; Xiaotian Zhang; Meng Luo; Wenhao Chai; and Zuozhu Liu

arXiv:2410.04070·cs.CL·March 14, 2025

PAD: Personalized Alignment of LLMs at Decoding-Time

Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, and Zuozhu Liu

PDF

Open Access 3 Reviews

TL;DR

PAD introduces a decoding-time framework for aligning large language models with personalized preferences without additional training, enabling real-time, scalable, and preference-specific text generation.

Contribution

This paper presents a novel decoding-time personalized alignment method that decouples preference modeling from training, allowing real-time, scalable, and generalizable alignment of LLMs.

Findings

01

Outperforms training-based alignment methods in preference alignment accuracy.

02

Demonstrates strong generalization to unseen preferences.

03

Scales effectively across different base models.

Abstract

Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods. In response, this paper presents Personalized Alignment at Decoding-time (PAD), a novel framework designed to align LLM outputs with diverse personalized preferences during the inference phase, eliminating the need for additional training. By introducing a unique personalized reward modeling strategy, this framework decouples the text generation process from personalized preferences, facilitating the generation of generalizable token-level personalized rewards. The PAD algorithm leverages these rewards to guide the decoding process, dynamically tailoring the base model's predictions to personalized preferences. Extensive experimental results demonstrate…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 2

Strengths

1. The paper discusses that PAD requires only a single policy model aligned with general preferences, eliminating additional training. This means that the algorithm operates efficiently without the need for the creation of multiple specialized models. The contrast with previous works in these aspects is detailed very well in Table 1. 2. Great presentation of the theoretical aspects of the algorithm. The experiment section a bit lacking in details though. Yet the analysis of different baselines

Weaknesses

1. Not much details are shared on how the model can generalize to unseen preferences not seen during the training. The w_p is dependent on p being specified so new preferences would need some retraining. 2. While the experiment section compares the performance of the model against the baselines, the latency and cost is not compared which will have practical implication in real world use.

Reviewer 02Rating 6Confidence 4

Strengths

- **Theoretical and empirical validation**: PAD combines theoretical innovation with empirical evidence, reinforcing its effectiveness. Experimental results align with theoretical assumptions, showcasing PAD’s strengths in accommodating personalized preferences. - **Superior performance over existing personalized alignment methods**: Compared to other approaches, PAD performs better across multiple metrics and significantly reduces computational costs, making it more feasible for practical appli

Weaknesses

- **Unclear description of training process**: While the paper focuses on theoretical derivations, the description of the actual training process lacks clarity, omitting key implementation details. Adding specific explanations of the training process would improve reader comprehension. - **Inconsistent notation and terminology**: - The definition of variable $a$ in line 287 is unclear, and the inconsistent use of subscript $t$ could lead to confusion. - The term PRM is commonly understood to

Reviewer 03Rating 8Confidence 3

Strengths

1. The paper is well-written and easy to follow. The authors build up the concept of PAD clearly. 2. Decoupling personalized preferences from the Markov Decision Process and the idea of the generation of generalizable token-level personalized reward is unique. 3. It does not need to further train the policy nor need to train multiple reward models unlike many existing works.

Weaknesses

1. The paper lacks a Pareto front analysis for scenarios involving multiple rewards. For experiments that combine multiple objectives (like "Harmless and Humor" or "Expert, Informative, and Creative" in Figure 3), including such an analysis would help illustrate the trade-offs between different preferences. 2. Table 3 only displays PAD's performance. Including results from other baseline methods across different models would be nice. 3. During inference, the need to load and run an additional LL

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsBalanced Selection · ALIGN