OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs
Yuting Gao, Weihao Chen, Lan Wang, Ruihan Xu, Qingpei Guo

TL;DR
OrdMoE introduces a self-supervised preference alignment method for multimodal LLMs that leverages internal expert routing scores to rank response quality, eliminating the need for costly human preference data.
Contribution
This work presents OrdMoE, a novel framework that constructs internal preference hierarchies within Mixture-of-Experts models using intrinsic signals, enabling zero-cost preference learning.
Findings
Significantly improves alignment and performance on multimodal benchmarks.
Achieves competitive results without external human preference annotations.
Effectively utilizes expert routing scores for response ranking.
Abstract
Preference learning has recently emerged as a pivotal strategy for post-training alignment of Multimodal Large Language Models (MLLMs). However, existing approaches predominantly rely on external human-annotated preference data, which is costly and labor-intensive to collect. In this work, we propose OrdMoE, a novel preference alignment framework that bypasses the reliance on external human preferences entirely by leveraging intrinsic signals within Mixture-of-Experts (MoE) architectures. Specifically, we observe that the router's expert selection scores implicitly encode a quality-aware ranking of responses (i.e. higher-scoring experts consistently generate higher-quality outputs). Building on this insight, OrdMoE constructs an internal preference hierarchy by grouping experts into ranked tiers based on their per-token routing scores and activating each tier separately to produce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Mobile Crowdsensing and Crowdsourcing
