Omni-RRM: Advancing Omni Reward Modeling via Automatic Rubric-Grounded Preference Synthesis

Zicheng Kong; Dehua Ma; Zhenbo Xu; Alven Yang; Yiwei Ru; Haoran Wang; Zixuan Zhou; Fuqing Bie; Liuyu Xiang; Huijia Wu; Jian Zhao; and Zhaofeng He

arXiv:2602.00846·cs.CL·February 3, 2026

Omni-RRM: Advancing Omni Reward Modeling via Automatic Rubric-Grounded Preference Synthesis

Zicheng Kong, Dehua Ma, Zhenbo Xu, Alven Yang, Yiwei Ru, Haoran Wang, Zixuan Zhou, Fuqing Bie, Liuyu Xiang, Huijia Wu, Jian Zhao, and Zhaofeng He

PDF

Open Access

TL;DR

Omni-RRM introduces a novel, open-source, multi-modal reward model that generates structured, rubric-grounded preferences across text, image, video, and audio, trained via automated synthesis and outperforming existing models.

Contribution

It presents Omni-RRM, the first open-source, rubric-grounded reward model for multiple modalities, trained with automated preference synthesis eliminating human annotation.

Findings

01

Achieves state-of-the-art accuracy on video and audio benchmarks.

02

Outperforms existing open-source reward models on image tasks.

03

Enhances downstream performance and transfers to text-only preference benchmarks.

Abstract

Multimodal large language models (MLLMs) have shown remarkable capabilities, yet their performance is often capped by the coarse nature of existing alignment techniques. A critical bottleneck remains the lack of effective reward models (RMs): existing RMs are predominantly vision-centric, return opaque scalar scores, and rely on costly human annotations. We introduce \textbf{Omni-RRM}, the first open-source rubric-grounded reward model that produces structured, multi-dimension preference judgments with dimension-wise justifications across \textbf{text, image, video, and audio}. At the core of our approach is \textbf{Omni-Preference}, a large-scale dataset built via a fully automated pipeline: we synthesize candidate response pairs by contrasting models of different capabilities, and use strong teacher models to \emph{reconcile and filter} preferences while providing a modality-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Generative Adversarial Networks and Image Synthesis