P-Aligner: Enabling Pre-Alignment of Language Models via Principled Instruction Synthesis

Feifan Song; Bofei Gao; Yifan Song; Yi Liu; Weimin Xiong; Yuyang Song; Tianyu Liu; Guoyin Wang; Houfeng Wang

arXiv:2508.04626·cs.CL·August 7, 2025

P-Aligner: Enabling Pre-Alignment of Language Models via Principled Instruction Synthesis

Feifan Song, Bofei Gao, Yifan Song, Yi Liu, Weimin Xiong, Yuyang Song, Tianyu Liu, Guoyin Wang, Houfeng Wang

PDF

2 Models 1 Datasets

TL;DR

P-Aligner is a lightweight instruction synthesis module that pre-aligns language models with human preferences, improving safety and helpfulness without costly retraining.

Contribution

The paper introduces P-Aligner, a novel, efficient method for pre-aligning language models using a new dataset and principled instruction synthesis, outperforming existing approaches.

Findings

01

P-Aligner achieves significant win-rate improvements on benchmarks.

02

It outperforms strong baselines across various models.

03

The method is validated for efficiency and effectiveness.

Abstract

Large Language Models (LLMs) are expected to produce safe, helpful, and honest content during interaction with human users, but they frequently fail to align with such values when given flawed instructions, e.g., missing context, ambiguous directives, or inappropriate tone, leaving substantial room for improvement along multiple dimensions. A cost-effective yet high-impact way is to pre-align instructions before the model begins decoding. Existing approaches either rely on prohibitive test-time search costs or end-to-end model rewrite, which is powered by a customized training corpus with unclear objectives. In this work, we demonstrate that the goal of efficient and effective preference alignment can be achieved by P-Aligner, a lightweight module generating instructions that preserve the original intents while being expressed in a more human-preferred form. P-Aligner is trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

songff/UltraPrompt
dataset· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.