APEX: Learning Adaptive Priorities for Multi-Objective Alignment in Vision-Language Generation

Dongliang Chen; Xinlin Zhuang; Junjie Xu; Luojian Xie; Zehui Wang; Jiaxi Zhuang; Haolin Yang; Liang Dou; Xiao He; Xingjiao Wu; Ying Qian

arXiv:2601.06574·cs.CV·January 13, 2026

APEX: Learning Adaptive Priorities for Multi-Objective Alignment in Vision-Language Generation

Dongliang Chen, Xinlin Zhuang, Junjie Xu, Luojian Xie, Zehui Wang, Jiaxi Zhuang, Haolin Yang, Liang Dou, Xiao He, Xingjiao Wu, Ying Qian

PDF

Open Access

TL;DR

APEX introduces a dynamic method for multi-objective alignment in vision-language models, balancing heterogeneous rewards and improving trade-offs across multiple objectives while maintaining stability.

Contribution

The paper presents APEX, a novel adaptive normalization and scheduling approach that addresses variance hijacking and gradient conflicts in multi-objective training.

Findings

01

Improved Pareto trade-offs on Stable Diffusion 3.5.

02

Balanced gains across multiple objectives.

03

Reduced instability in multi-objective alignment.

Abstract

Multi-objective alignment for text-to-image generation is commonly implemented via static linear scalarization, but fixed weights often fail under heterogeneous rewards, leading to optimization imbalance where models overfit high-variance, high-responsiveness objectives (e.g., OCR) while under-optimizing perceptual goals. We identify two mechanistic causes: variance hijacking, where reward dispersion induces implicit reweighting that dominates the normalized training signal, and gradient conflicts, where competing objectives produce opposing update directions and trigger seesaw-like oscillations. We propose APEX (Adaptive Priority-based Efficient X-objective Alignment), which stabilizes heterogeneous rewards with Dual-Stage Adaptive Normalization and dynamically schedules objectives via P^3 Adaptive Priorities that combine learning potential, conflict penalty, and progress need. On…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling