ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

Jinho Choi; Hyesu Lim; Steffen Schneider; Jaegul Choo

arXiv:2510.26186·cs.CV·October 31, 2025

ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

Jinho Choi, Hyesu Lim, Steffen Schneider, Jaegul Choo

PDF

1 Video

TL;DR

ConceptScope is an automated framework that identifies and characterizes dataset biases by discovering human-interpretable visual concepts, aiding dataset auditing and robustness evaluation without requiring detailed annotations.

Contribution

It introduces a scalable, automated method using Sparse Autoencoders and vision foundation models to categorize and analyze visual concepts related to dataset bias.

Findings

01

Effectively captures a wide range of visual concepts including objects, textures, and attributes.

02

Successfully detects known biases and uncovers new, unannotated biases in datasets.

03

Produces spatial attributions aligned with meaningful image regions.

Abstract

Dataset bias, where data points are skewed to certain concepts, is ubiquitous in machine learning datasets. Yet, systematically identifying these biases is challenging without costly, fine-grained attribute annotations. We present ConceptScope, a scalable and automated framework for analyzing visual datasets by discovering and quantifying human-interpretable concepts using Sparse Autoencoders trained on representations from vision foundation models. ConceptScope categorizes concepts into target, context, and bias types based on their semantic relevance and statistical correlation to class labels, enabling class-level dataset characterization, bias identification, and robustness evaluation through concept-based subgrouping. We validate that ConceptScope captures a wide range of visual concepts, including objects, textures, backgrounds, facial attributes, emotions, and actions, through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts· slideslive