Prediction-Oriented Subsampling from Data Streams
Benedetta Lavinia Mussati, Freddie Bickford Smith, Tom Rainforth, Stephen Roberts

TL;DR
This paper proposes a prediction-oriented subsampling method for data streams that reduces uncertainty in specific predictions, demonstrating improved performance over previous techniques with careful model design.
Contribution
It introduces an information-theoretic subsampling approach focused on prediction accuracy, advancing data stream learning methods.
Findings
Outperforms previous information-theoretic subsampling methods
Effective in two widely studied problems
Requires careful model design for strong performance
Abstract
Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
