Conformal Frequency Estimation with Sketched Data
Matteo Sesia, Stefano Favaro

TL;DR
This paper introduces a flexible, data-adaptive conformal inference method for constructing valid confidence intervals for object frequencies in large datasets using small sketches, applicable without distribution knowledge.
Contribution
It develops a novel conformal inference approach that provides valid confidence intervals for frequency estimates from sketched data, independent of distribution assumptions.
Findings
Method achieves valid confidence intervals in simulations.
Performance comparable or superior to Bayesian methods.
Effective on DNA sequences and literary data.
Abstract
A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in very large data sets, based on a much smaller sketch of those data. The approach is data-adaptive and requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals under the sole assumption of data exchangeability. Although our solution is broadly applicable, this paper focuses on applications involving the count-min sketch algorithm and a non-linear variation thereof. The performance is compared to that of frequentist and Bayesian alternatives through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Bayesian Methods and Mixture Models · Speech Recognition and Synthesis
