OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs

Yuting Gao; Weihao Chen; Lan Wang; Ruihan Xu; Qingpei Guo

arXiv:2511.19023·cs.LG·November 25, 2025

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs

Yuting Gao, Weihao Chen, Lan Wang, Ruihan Xu, Qingpei Guo

PDF

Open Access

TL;DR

OrdMoE introduces a self-supervised preference alignment method for multimodal LLMs that leverages internal expert routing scores to rank response quality, eliminating the need for costly human preference data.

Contribution

This work presents OrdMoE, a novel framework that constructs internal preference hierarchies within Mixture-of-Experts models using intrinsic signals, enabling zero-cost preference learning.

Findings

01

Significantly improves alignment and performance on multimodal benchmarks.

02

Achieves competitive results without external human preference annotations.

03

Effectively utilizes expert routing scores for response ranking.

Abstract

Preference learning has recently emerged as a pivotal strategy for post-training alignment of Multimodal Large Language Models (MLLMs). However, existing approaches predominantly rely on external human-annotated preference data, which is costly and labor-intensive to collect. In this work, we propose OrdMoE, a novel preference alignment framework that bypasses the reliance on external human preferences entirely by leveraging intrinsic signals within Mixture-of-Experts (MoE) architectures. Specifically, we observe that the router's expert selection scores implicitly encode a quality-aware ranking of responses (i.e. higher-scoring experts consistently generate higher-quality outputs). Building on this insight, OrdMoE constructs an internal preference hierarchy by grouping experts into ranked tiers based on their per-token routing scores and activating each tier separately to produce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Mobile Crowdsensing and Crowdsourcing