Stackelberg Self-Annotation: A Robust Approach to Data-Efficient LLM Alignment

Xu Chu; Zhixin Zhang; Tianyu Jia; Yujie Jin

arXiv:2502.18099·cs.LG·January 22, 2026

Stackelberg Self-Annotation: A Robust Approach to Data-Efficient LLM Alignment

Xu Chu, Zhixin Zhang, Tianyu Jia, Yujie Jin

PDF

Open Access 1 Video

TL;DR

This paper introduces a robust, data-efficient framework for aligning large language models with human preferences using a Stackelberg game approach, significantly reducing the need for extensive human-labeled data.

Contribution

The paper proposes SGPO, a Stackelberg game-based alignment method, and SSAPO, a self-annotation technique that achieves strong performance with minimal human labels.

Findings

01

SSAPO uses only 2K seed preferences to outperform benchmarks.

02

SSAPO maintains robustness against noisy self-labels.

03

The approach reduces human annotation costs by over 95%.

Abstract

Aligning large language models (LLMs) with human preferences typically demands vast amounts of meticulously curated data, which is both expensive and prone to labeling noise. We propose Stackelberg Game Preference Optimization (SGPO), a robust alignment framework that models alignment as a two-player Stackelberg game between a policy (leader) and a worst-case preference distribution (follower). The proposed SGPO guarantees $O (ϵ)$ -bounded regret within an $ϵ$ -Wasserstein ball, offering formal robustness to (self-)annotation noise. We instantiate SGPO with Stackelberg Self-Annotated Preference Optimization (SSAPO), which uses minimal human-labeled "seed" preferences and iteratively self-annotates new prompts. In each iteration, SSAPO applies a distributionally robust reweighting of synthetic annotations, ensuring that noisy or biased self-labels do not derail…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stackelberg Self-Annotation: A Robust Approach to Data-Efficient LLM Alignment· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization

MethodsAttention Is All You Need · Absolute Position Encodings · Dense Connections · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer