Support Estimation with Sampling Artifacts and Errors
Eli Chien, Olgica Milenkovic, Angelia Nedich

TL;DR
This paper introduces a novel support estimation method that accounts for sampling artifacts and errors, particularly in biological data, using Poisson repeat channels and Chebyshev approximations, demonstrating significant improvements over existing methods.
Contribution
It presents the first support estimation approach that handles sampling artifacts and errors with Poisson repeat channels, employing regularized Chebyshev approximations and semi-infinite programming.
Findings
Significant improvement over existing noiseless support estimation methods.
Effective in biological data, especially SARS-CoV-2 mutational support estimation.
Validated on synthetic, textual, and biological datasets.
Abstract
The problem of estimating the support of a distribution is of great importance in many areas of machine learning, computer science, physics and biology. Most of the existing work in this domain has focused on settings that assume perfectly accurate sampling approaches, which is seldom true in practical data science. Here we introduce the first known approach to support estimation in the presence of sampling artifacts and errors where each sample is assumed to arise from a Poisson repeat channel which simultaneously captures repetitions and deletions of samples. The proposed estimator is based on regularized weighted Chebyshev approximations, with weights governed by evaluations of so-called Touchard (Bell) polynomials. The supports in the presence of sampling artifacts are calculated using discretized semi-infite programming methods. The estimation approach is tested on synthetic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
