Prototype-based Dataset Comparison
Nanne van Noord

TL;DR
This paper introduces a self-supervised prototype-based method for comparing datasets, enabling richer inspection beyond prominent concepts, demonstrated through two case studies.
Contribution
It presents a novel module that learns concept-level prototypes across datasets using self-supervised learning, enhancing dataset comparison capabilities.
Findings
Dataset comparison extends dataset inspection.
Prototype learning uncovers diverse visual concepts.
Method benefits demonstrated in two case studies.
Abstract
Dataset summarisation is a fruitful approach to dataset inspection. However, when applied to a single dataset the discovery of visual concepts is restricted to those most prominent. We argue that a comparative approach can expand upon this paradigm to enable richer forms of dataset inspection that go beyond the most prominent concepts. To enable dataset comparison we present a module that learns concept-level prototypes across datasets. We leverage self-supervised learning to discover these prototypes without supervision, and we demonstrate the benefits of our approach in two case-studies. Our findings show that dataset comparison extends dataset inspection and we hope to encourage more works in this direction. Code and usage instructions available at https://github.com/Nanne/ProtoSim
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Cell Image Analysis Techniques · Machine Learning and Data Classification
