Does the dataset meet your expectations? Explaining sample representation in image data
Dhasarathy Parthasarathy, Anton Johansson

TL;DR
This paper introduces a method to explain sample diversity deficiencies in image datasets by comparing actual annotation distributions with expected ones, using simulation to identify representation gaps affecting neural network performance.
Contribution
It proposes a novel approach that leverages annotation-based summaries and simulation to explain dataset sample representation and diversity issues.
Findings
Identified sample representation gaps in geometric shape datasets.
Demonstrated the method's ability to explain diversity in terms of size, position, and brightness.
Showed that annotation mismatch can reveal dataset deficiencies affecting model behavior.
Abstract
Since the behavior of a neural network model is adversely affected by a lack of diversity in training data, we present a method that identifies and explains such deficiencies. When a dataset is labeled, we note that annotations alone are capable of providing a human interpretable summary of sample diversity. This allows explaining any lack of diversity as the mismatch found when comparing the \textit{actual} distribution of annotations in the dataset with an \textit{expected} distribution of annotations, specified manually to capture essential label diversity. While, in many practical cases, labeling (samples annotations) is expensive, its inverse, simulation (annotations samples) can be cheaper. By mapping the expected distribution of annotations into test samples using parametric simulation, we present a method that explains sample representation using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications
