SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization

Yue Huang; Xiangqi Wang; Xiangliang Zhang

arXiv:2511.06222·cs.CL·November 11, 2025

SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization

Yue Huang, Xiangqi Wang, Xiangliang Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces Self-Priority Alignment (SPA), a novel unsupervised framework that enforces a strict trustworthiness-before-helpfulness order in LLMs, improving safety and helpfulness in high-stakes scenarios.

Contribution

SPA is the first unsupervised method to implement priority alignment, generating and refining responses to ensure safety before helpfulness in LLMs.

Findings

01

SPA outperforms strong baselines in helpfulness and safety.

02

SPA maintains general capabilities while improving alignment.

03

The framework is scalable and interpretable.

Abstract

In high-stakes scenarios-such as self-harm, legal, or medical queries-LLMs must be both trustworthy and helpful. However, these goals often conflict. We propose priority alignment, a new alignment paradigm that enforces a strict "trustworthy-before-helpful" ordering: optimization of helpfulness is conditioned on first meeting trustworthy thresholds (e.g., harmlessness or honesty). To realize this, we introduce Self-Priority Alignment (SPA)-a fully unsupervised framework that generates diverse responses, self-evaluates them and refines them by the model itself, and applies dual-criterion denoising to remove inconsistency and control variance. From this, SPA constructs lexicographically ordered preference pairs and fine-tunes the model using an uncertainty-weighted alignment loss that emphasizes high-confidence, high-gap decisions. Experiments across multiple benchmarks show that SPA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization· underline

Taxonomy

TopicsData Quality and Management · Topic Modeling · Semantic Web and Ontologies