Prototype-based Dataset Comparison

Nanne van Noord

arXiv:2309.02401·cs.CV·September 6, 2023·1 cites

Prototype-based Dataset Comparison

Nanne van Noord

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised prototype-based method for comparing datasets, enabling richer inspection beyond prominent concepts, demonstrated through two case studies.

Contribution

It presents a novel module that learns concept-level prototypes across datasets using self-supervised learning, enhancing dataset comparison capabilities.

Findings

01

Dataset comparison extends dataset inspection.

02

Prototype learning uncovers diverse visual concepts.

03

Method benefits demonstrated in two case studies.

Abstract

Dataset summarisation is a fruitful approach to dataset inspection. However, when applied to a single dataset the discovery of visual concepts is restricted to those most prominent. We argue that a comparative approach can expand upon this paradigm to enable richer forms of dataset inspection that go beyond the most prominent concepts. To enable dataset comparison we present a module that learns concept-level prototypes across datasets. We leverage self-supervised learning to discover these prototypes without supervision, and we demonstrate the benefits of our approach in two case-studies. Our findings show that dataset comparison extends dataset inspection and we hope to encourage more works in this direction. Code and usage instructions available at https://github.com/Nanne/ProtoSim

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nanne/protosim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Cell Image Analysis Techniques · Machine Learning and Data Classification