State Diversity Matters in Offline Behavior Distillation
Shiye Lei, Zhihao Cheng, Dacheng Tao

TL;DR
This paper demonstrates that emphasizing state diversity in Offline Behavior Distillation improves policy performance, especially with limited original dataset diversity, by analyzing the roles of state quality and diversity.
Contribution
It uncovers the importance of state diversity in OBD, provides theoretical insights into error reduction, and introduces a simple density-weighted algorithm to enhance distillation.
Findings
State diversity correlates with better policy performance under high training loss.
The proposed SDW algorithm improves distillation results on datasets with limited diversity.
Theoretical analysis highlights the dominance of surrounding error over pivotal error in certain scenarios.
Abstract
Offline Behavior Distillation (OBD), which condenses massive offline RL data into a compact synthetic behavioral dataset, offers a promising approach for efficient policy training and can be applied across various downstream RL tasks. In this paper, we uncover a misalignment between original and distilled datasets, observing that a high-quality original dataset does not necessarily yield a superior synthetic dataset. Through an empirical analysis of policy performance under varying levels of training loss, we show that datasets with greater state diversity outperforms those with higher state quality when training loss is substantial, as is often the case in OBD, whereas the relationship reverses under minimal loss, which contributes to the misalignment. By associating state quality and diversity in reducing pivotal and surrounding error, respectively, our theoretical analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Machine Learning and Data Classification
