Sketching Linear Classifiers over Data Streams
Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant

TL;DR
This paper presents the Weight-Median Sketch, a memory-efficient data structure for learning and analyzing linear classifiers over data streams, enabling recovery of discriminative features and supporting various statistical tasks.
Contribution
The paper introduces the Weight-Median Sketch, a novel sub-linear space sketch that captures discriminative features for streaming linear classifiers, with theoretical guarantees and empirical advantages.
Findings
Supports efficient recovery of large-magnitude weights.
Outperforms count-based sketches and feature hashing in memory-accuracy trade-offs.
Enables multiple streaming statistical analyses with limited memory.
Abstract
We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. Unlike related sketches that capture the most frequently-occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median Sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Advanced Bandit Algorithms Research
