TL;DR
This paper introduces efficient MapReduce and Streaming algorithms for diversity maximization in metric spaces with bounded doubling dimension, achieving near-optimal approximation ratios and scaling to billion-point datasets.
Contribution
It develops space and pass-efficient algorithms with provable approximation guarantees for diversity maximization in specialized metric spaces, improving over existing methods.
Findings
Achieves $(eta+ ext{epsilon})$-approximation ratios, surpassing previous algorithms.
Scales effectively to datasets with over a billion points.
Provides extensive experimental validation on real and synthetic data.
Abstract
Given a dataset of points in a metric space and an integer , a diversity maximization problem requires determining a subset of points maximizing some diversity objective measure, e.g., the minimum or the average distance between two points in the subset. Diversity maximization is computationally hard, hence only approximate solutions can be hoped for. Although its applications are mainly in massive data analysis, most of the past research on diversity maximization focused on the sequential setting. In this work we present space and pass/round-efficient diversity maximization algorithms for the Streaming and MapReduce models and analyze their approximation guarantees for the relevant class of metric spaces of bounded doubling dimension. Like other approaches in the literature, our algorithms rely on the determination of high-quality core-sets, i.e., (much) smaller subsets of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
