Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets

Aaryan Gupta; Rishi Saket; Aravindan Raghuveer

arXiv:2512.00536·cs.LG·December 2, 2025

Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets

Aaryan Gupta, Rishi Saket, Aravindan Raghuveer

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel dataset distillation algorithm for supervised regression and offline reinforcement learning, providing theoretical guarantees on its efficiency and extending it to leverage reward and state information without policy optimization.

Contribution

The paper develops a new dataset distillation method with provable guarantees for supervised regression and extends it to offline RL, incorporating Bellman loss without policy training.

Findings

01

Algorithm needs only (d^2) regressors for near-optimal performance.

02

Proves a matching lower bound of (d^2), showing tightness.

03

Experimental results validate theoretical guarantees and show performance improvements.

Abstract

Given a training dataset, the goal of dataset distillation is to derive a synthetic dataset such that models trained on the latter perform as well as those trained on the training dataset. In this work, we develop and analyze an efficient dataset distillation algorithm for supervised learning, specifically regression in $R^{d}$ , based on matching the losses on the training and synthetic datasets with respect to a fixed set of randomly sampled regressors without any model training. Our first key contribution is a novel performance guarantee proving that our algorithm needs only $\tilde{O} (d^{2})$ sampled regressors to derive a synthetic dataset on which the MSE loss of any bounded linear model is nearly the same as its MSE loss on the given training data. In particular, the model optimized on the synthetic data has close to minimum loss on the training data, thus performing nearly…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

- The use of loss-matching method using randomly sampled linear regressors/Q-value predictors is particularly important tool that is used in RL literature. - Two results for each setting look comprehensive as contributions to the learning community. - Experiments demonstrate that small synthetic sets and few sampled models can perform competitively on standard regression datasets and classic offline RL benchmarks.

Weaknesses

- Even though experiments demonstrate that small synthetic sets match the performance when trained with entire training dataset, some issues pop up: - - This phenomena cannot be explained by the current theory. The distillation problem is only interesting for the case $n>>m$. This is also the focus of the literature (Light et al. 25, Lei et al. 24) cited by this work. - - Lemma C.4 does provide net arguments for the choice of synthetic dataset size. But it seems that both $m$ and $n$ are proport

Reviewer 02Rating 6Confidence 2

Strengths

1. The paper provides theoretical guarantees for dataset distillation, an area where such analysis is often lacking. The upper and matching lower bounds for the supervised case are particularly compelling. 2. The experiments adequately support the theoretical claims, showing that the method works well in practice, even with non-linear neural networks, and outperforms baseline approaches like random subsampling.

Weaknesses

1. The empirical evaluation is limited to relatively small-scale datasets and standard RL benchmarks. A more extensive evaluation on larger-scale or more complex datasets would strengthen the claims of practical efficacy.

Reviewer 03Rating 6Confidence 1

Strengths

- The paper is well structured and addresses comprehensive details. - Extensive theoretical proofs concretely support the main claim. - Experiments demonstrate that the proposed method shows a clear margin compared to other baselines, recovering near-optimal or even outperforming performance.

Weaknesses

- The experimental section appears relatively narrow, focusing on small regression settings with Gym control tasks. Comparing with other baselines (Lei et al. 2024, Light et al. 2024) for generating synthetic datasets would improve the soundness of the suggested method.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification