SkillRater: Untangling Capabilities in Multimodal Data

Naveen Sahi; Jeremy Dohmann; Armen Aghajanyan; Akshat Shrivastava

arXiv:2602.11615·cs.LG·February 13, 2026

SkillRater: Untangling Capabilities in Multimodal Data

Naveen Sahi, Jeremy Dohmann, Armen Aghajanyan, Akshat Shrivastava

PDF

Open Access

TL;DR

SkillRater introduces a multidimensional data filtering framework that trains specialized raters for different capabilities, improving model performance across vision language tasks by preserving diverse high-quality samples.

Contribution

The paper proposes a novel multi-capability data filtering method using meta-learned specialized raters, outperforming traditional scalar scoring approaches.

Findings

01

Improves vision language model performance by up to 5.63% on key capabilities.

02

Demonstrates that learned rater signals are near orthogonal, indicating independent quality dimensions.

03

Outperforms unfiltered training and monolithic filtering baselines.

Abstract

Data curation methods typically assign samples a single quality score. We argue this scalar framing is fundamentally limited: when training requires multiple distinct capabilities, a monolithic scorer cannot maximize useful signals for all of them simultaneously. Quality is better understood as multidimensional, with each dimension corresponding to a capability the model must acquire. We introduce SkillRater, a framework that decomposes data filtering into specialized raters - one per capability, each trained via meta-learning on a disjoint validation objective - and composes their scores through a progressive selection rule: at each training stage, a sample is retained if any rater ranks it above a threshold that tightens over time, preserving diversity early while concentrating on high-value samples late. We validate this approach on vision language models, decomposing quality into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning