Asymmetric scale functions for $t$-digests
Joseph Ross

TL;DR
This paper introduces an asymmetric scale function variant for $t$-digests, enhancing accuracy near distribution tails with adjustable resource-accuracy tradeoffs, especially for skewed data.
Contribution
It develops a new asymmetric scale function for $t$-digests, preserving key properties and enabling more efficient, skew-aware quantile approximation.
Findings
Asymmetric $t$-digest maintains accuracy on one tail with less memory.
Tangent line construction preserves online and mergeable properties.
Empirical results show improved efficiency for skewed distributions.
Abstract
The -digest is a data structure that can be queried for approximate quantiles, with greater accuracy near the minimum and maximum of the distribution. We develop a -digest variant with accuracy asymmetric about the median, thereby making possible alternative tradeoffs between computational resources and accuracy which may be of particular interest for distributions with significant skew. After establishing some theoretical properties of scale functions for -digests, we show that a tangent line construction on the familiar scale functions preserves the crucial properties that allow -digests to operate online and be mergeable. We conclude with an empirical study demonstrating the asymmetric variant preserves accuracy on one side of the distribution with a much smaller memory footprint.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Machine Learning and Data Classification · Machine Learning and Algorithms
