SkillRater: Untangling Capabilities in Multimodal Data
Naveen Sahi, Jeremy Dohmann, Armen Aghajanyan, Akshat Shrivastava

TL;DR
SkillRater introduces a multidimensional data filtering framework that trains specialized raters for different capabilities, improving model performance across vision language tasks by preserving diverse high-quality samples.
Contribution
The paper proposes a novel multi-capability data filtering method using meta-learned specialized raters, outperforming traditional scalar scoring approaches.
Findings
Improves vision language model performance by up to 5.63% on key capabilities.
Demonstrates that learned rater signals are near orthogonal, indicating independent quality dimensions.
Outperforms unfiltered training and monolithic filtering baselines.
Abstract
Data curation methods typically assign samples a single quality score. We argue this scalar framing is fundamentally limited: when training requires multiple distinct capabilities, a monolithic scorer cannot maximize useful signals for all of them simultaneously. Quality is better understood as multidimensional, with each dimension corresponding to a capability the model must acquire. We introduce SkillRater, a framework that decomposes data filtering into specialized raters - one per capability, each trained via meta-learning on a disjoint validation objective - and composes their scores through a progressive selection rule: at each training stage, a sample is retained if any rater ranks it above a threshold that tightens over time, preserving diversity early while concentrating on high-value samples late. We validate this approach on vision language models, decomposing quality into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
