An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation
Jingyi Zhang, Cheng Meng, Jun Yu, Mengrui Zhang, Wenxuan Zhong and, Ping Ma

TL;DR
This paper introduces a novel model-free subsampling method based on optimal transport, which efficiently selects representative samples for accurate kernel density estimation, overcoming computational and theoretical limitations of existing methods.
Contribution
The paper proposes an optimal transport-based subsampling approach with an adaptive algorithm and theoretical guarantees for density estimation convergence.
Findings
Superior performance on synthetic datasets
Effective in real-world data applications
Theoretical convergence rate established
Abstract
Subsampling methods aim to select a subsample as a surrogate for the observed sample. Such methods have been used pervasively in large-scale data analytics, active learning, and privacy-preserving analysis in recent decades. Instead of model-based methods, in this paper, we study model-free subsampling methods, which aim to identify a subsample that is not confined by model assumptions. Existing model-free subsampling methods are usually built upon clustering techniques or kernel tricks. Most of these methods suffer from either a large computational burden or a theoretical weakness. In particular, the theoretical weakness is that the empirical distribution of the selected subsample may not necessarily converge to the population distribution. Such computational and theoretical limitations hinder the broad applicability of model-free subsampling methods in practice. We propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms · Target Tracking and Data Fusion in Sensor Networks
