Evaluating Object-Centric Models beyond Object Discovery
Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

TL;DR
This paper proposes a comprehensive evaluation framework for object-centric learning models, using instruction-tuned vision-language models and unified metrics to better assess their usefulness for complex reasoning and localization tasks.
Contribution
It introduces a unified evaluation task and metric, and employs instruction-tuned VLMs for scalable benchmarking across diverse reasoning datasets.
Findings
Unified evaluation improves assessment consistency
Instruction-tuned VLMs effectively measure representation usefulness
Baseline provides reference for multi-feature reconstruction
Abstract
Object-centric learning (OCL) aims to learn structured scene representations that support compositional generalization and robustness to out-of-distribution (OOD) data. However, OCL models are often not evaluated regarding these goals. Instead, most prior work focuses on evaluating OCL models solely through object discovery and simple reasoning tasks, such as probing the representation via image classification. We identify two limitations in existing benchmarks: (1) They provide limited insights on the representation usefulness of OCL models, and (2) localization and representation usefulness are assessed using disjoint metrics. To address (1), we use instruction-tuned VLMs as evaluators, enabling scalable benchmarking across diverse VQA datasets to measure how well VLMs leverage OCL representations for complex reasoning tasks. To address (2), we introduce a unified evaluation task and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning and Data Classification · Machine Learning and Algorithms
