Dataset Summarization by K Principal Concepts
Niv Cohen, Yedid Hoshen

TL;DR
This paper introduces a novel task of identifying K principal human-interpretable concepts that best summarize datasets, using image-language embeddings and an optimization approach to select the most explanatory concepts from a large candidate set.
Contribution
The paper proposes a new method for dataset summarization by selecting K key concepts via a facility location formulation and scalable optimization, enhancing interpretability over traditional image-based summaries.
Findings
Effective concept selection from large candidate sets.
Improved dataset interpretability through explicit concept summaries.
Potential for dataset classification using identified concepts.
Abstract
We propose the new task of K principal concept identification for dataset summarizarion. The objective is to find a set of K concepts that best explain the variation within the dataset. Concepts are high-level human interpretable terms such as "tiger", "kayaking" or "happy". The K concepts are selected from a (potentially long) input list of candidates, which we denote the concept-bank. The concept-bank may be taken from a generic dictionary or constructed by task-specific prior knowledge. An image-language embedding method (e.g. CLIP) is used to map the images and the concept-bank into a shared feature space. To select the K concepts that best explain the data, we formulate our problem as a K-uncapacitated facility location problem. An efficient optimization technique is used to scale the local search algorithm to very large concept-banks. The output of our method is a set of K…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Domain Adaptation and Few-Shot Learning
