Uncovering Cross-Objective Interference in Multi-Objective Alignment

Yining Lu; Meng Jiang

arXiv:2602.06869·cs.CL·May 7, 2026

Uncovering Cross-Objective Interference in Multi-Objective Alignment

Yining Lu, Meng Jiang

PDF

1 Repo

TL;DR

This paper investigates the phenomenon of cross-objective interference in multi-objective alignment of large language models, providing a theoretical framework, empirical analysis, and a mitigation method called CTWA.

Contribution

It formalizes cross-objective interference, derives a covariance law explaining it, and introduces CTWA to mitigate interference in multi-objective LLM training.

Findings

01

Interference is widespread and model-dependent.

02

Positive covariance between reward and scalarized score improves objectives.

03

CTWA effectively reduces cross-objective interference.

Abstract

We study a persistent failure mode in multi-objective alignment for large language models (LLMs): training improves performance on only a subset of objectives while causing others to degrade. We formalize this phenomenon as cross-objective interference and conduct the first systematic study across scalarization algorithms, showing that interference is pervasive and exhibits strong model dependence. To explain this phenomenon, we derive a local covariance law showing that an objective improves when its reward exhibits positive covariance with the scalarized score. We extend this analysis to clipped surrogate objectives used in modern alignment, demonstrating that the covariance law remains valid under mild conditions despite clipping. Building on this analysis, we propose Covariance Targeted Weight Adaptation (CTWA), a plug-and-play method that maintains positive covariance between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yining610/ctwa
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.