Learning-based Sketches for Frequency Estimation in Data Streams without Ground Truth
Xinyu Yuan, Yan Qiao, Meng Li, Zhenchun Wei, Cuiying Feng, Zonghui Wang, Wenzhi Chen

TL;DR
UCL-sketch is a novel online learning-based sketching method for frequency estimation in data streams that requires no ground truth, offers high accuracy, scalability, and significantly faster decoding compared to existing approaches.
Contribution
We introduce UCL-sketch, a scalable, online learning-based frequency estimation method that operates without ground truth and achieves superior accuracy and speed.
Findings
Outperforms previous methods in accuracy and distribution.
Achieves near-oracle quality under tight memory constraints.
Decoding speed is nearly 500 times faster than existing equation-based sketches.
Abstract
Estimating the frequency of items on the high-volume, fast data stream has been extensively studied in many areas, such as database and network measurement. Traditional sketches provide only coarse estimates under strict memory constraints. Although some learning-augmented methods have emerged recently, they typically rely on offline training with real frequencies or/and labels, which are often unavailable. Moreover, these methods suffer from slow update speeds, limiting their suitability for real-time processing despite offering only marginal accuracy improvements. To overcome these challenges, we propose UCL-sketch, a practical learning-based paradigm for per-key frequency estimation. Our design introduces two key innovations: (i) an online training mechanism based on equivalent learning that requires no ground truth (GT), and (ii) a highly scalable architecture leveraging logically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Neural Networks and Applications · Anomaly Detection Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
