Prediction-Oriented Subsampling from Data Streams

Benedetta Lavinia Mussati; Freddie Bickford Smith; Tom Rainforth; Stephen Roberts

arXiv:2508.03868·cs.LG·December 23, 2025

Prediction-Oriented Subsampling from Data Streams

Benedetta Lavinia Mussati, Freddie Bickford Smith, Tom Rainforth, Stephen Roberts

PDF

TL;DR

This paper proposes a prediction-oriented subsampling method for data streams that reduces uncertainty in specific predictions, demonstrating improved performance over previous techniques with careful model design.

Contribution

It introduces an information-theoretic subsampling approach focused on prediction accuracy, advancing data stream learning methods.

Findings

01

Outperforms previous information-theoretic subsampling methods

02

Effective in two widely studied problems

03

Requires careful model design for strong performance

Abstract

Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.