SliceLens: Fine-Grained and Grounded Error Slice Discovery for Multi-Instance Vision Tasks

Wei Zhang; Chaoqun Wang; Zixuan Guan; Sam Kao; Pengfei Zhao; Peng Wu; Sifeng He

arXiv:2512.24592·cs.CV·January 1, 2026

SliceLens: Fine-Grained and Grounded Error Slice Discovery for Multi-Instance Vision Tasks

Wei Zhang, Chaoqun Wang, Zixuan Guan, Sam Kao, Pengfei Zhao, Peng Wu, Sifeng He

PDF

Open Access

TL;DR

SliceLens introduces a novel grounded visual reasoning framework using LLMs and VLMs to discover fine-grained, interpretable error slices in multi-instance vision tasks, addressing limitations of existing methods and benchmarks.

Contribution

It presents SliceLens, a hypothesis-driven approach leveraging language and vision models for error slice discovery, and introduces FeSD, the first benchmark for fine-grained error slice evaluation in instance-level vision tasks.

Findings

01

Achieves state-of-the-art Precision@10 on FeSD benchmark.

02

Effectively identifies interpretable error slices for model improvement.

03

Improves error slice detection performance significantly over baselines.

Abstract

Systematic failures of computer vision models on subsets with coherent visual patterns, known as error slices, pose a critical challenge for robust model evaluation. Existing slice discovery methods are primarily developed for image classification, limiting their applicability to multi-instance tasks such as detection, segmentation, and pose estimation. In real-world scenarios, error slices often arise from corner cases involving complex visual relationships, where existing instance-level approaches lacking fine-grained reasoning struggle to yield meaningful insights. Moreover, current benchmarks are typically tailored to specific algorithms or biased toward image classification, with artificial ground truth that fails to reflect real model failures. To address these limitations, we propose SliceLens, a hypothesis-driven framework that leverages LLMs and VLMs to generate and verify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)