DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps
Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita, Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng

TL;DR
DendroMap introduces a treemap-based visualization tool for interactively exploring large-scale image datasets in machine learning, enabling hierarchical organization, detailed inspection, and better understanding of dataset diversity and model performance.
Contribution
This paper presents DendroMap, a novel treemap-based visualization method that improves the exploration of large image datasets by organizing images hierarchically based on high-dimensional features.
Findings
Users can discover dataset insights and model errors effectively.
DendroMap outperforms t-SNE grid visualizations in user preference and task efficiency.
Participants preferred DendroMap for grouping and searching tasks.
Abstract
In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Data Visualization and Analytics · Cell Image Analysis Techniques
