Using Temperature Sampling to Effectively Train Robot Learning Policies on Imbalanced Datasets
Basavasagar Patil, Sydney Belt, Jayjun Lee, Nima Fazeli, Bernadette Bucher

TL;DR
This paper introduces a simple temperature sampling strategy to address dataset imbalance in robot learning, improving generalization and performance on low-resource tasks while maintaining high-resource task performance.
Contribution
The authors propose a novel, easy-to-implement sampling method that mitigates dataset imbalance in robotic policy training, enhancing multi-task learning capabilities.
Findings
Significant improvements on low-resource tasks compared to prior methods.
No degradation in performance on high-resource tasks.
Validated effectiveness on real-world robot experiments.
Abstract
Increasingly large datasets of robot actions and sensory observations are being collected to train ever-larger neural networks. These datasets are collected based on tasks and while these tasks may be distinct in their descriptions, many involve very similar physical action sequences (e.g., 'pick up an apple' versus 'pick up an orange'). As a result, many datasets of robotic tasks are substantially imbalanced in terms of the physical robotic actions they represent. In this work, we propose a simple sampling strategy for policy training that mitigates this imbalance. Our method requires only a few lines of code to integrate into existing codebases and improves generalization. We evaluate our method in both pre-training small models and fine-tuning large foundational models. Our results show substantial improvements on low-resource tasks compared to prior state-of-the-art methods, without…
Peer Reviews
Decision·Submitted to ICLR 2026
(1) The problem tackled of picking sub-dataset sampling frequencies is an important problem in robotics, especially as more recent work ventures to train large robot policies on more heterogeneous data sources. (2) The related works section of the paper is thorough (3) The overall method makes sense, and experiments show better downstream task performance for their proposed sampling method in both real and sim. (4) The authors compare against relevant baselines, including Re-Mix and the simpl
(1) The particular problem statement itself is not thoroughly justified. The problem statement in question is how to pick sampling frequencies when there exist subsets, each with different amounts of data for different action primitives. While action primitive decomposition is indeed one way of splitting data, it is not the only one, and there are many other axes upon which there exist low-resource and high-resource splits, such as language instructions, camera viewpoints, lighting, environment
- The method they propose is simple to understand and easy to implement compared to other works, and is shown to work well in practice in two simulated robotic environments and one real robot set up. - The authors provide good ablation studies to show how the method works with different model size, different datasets, and different annealing process.
- It is unclear to me how the authors arrived at this particular form of temperature sampling. For example, instead of using |D|^(1/t), why not exp(|D|)^(1/t) for example? What about other forms? It would be nice to see how different forms for temperature sampling impacts policy performance. - it would be nice to see whether this method works with larger and more realistic datasets. Currently, the authors hand pick a subset of Robocasa & LIBERO for their experiments. It is unclear in the paper h
- The problem of action-primitive imbalance is critical for the community as it moves toward large-scale datasets. - The method is simple, efficient, and easy to implement. It also appears more stable than complex baselines like ReMix. - The evaluation is comprehensive, covering a toy task, simulation (training from scratch and fine-tuning), and real-world hardware validation, strongly supporting the claims. - Solid ablation studies validate the impact of schedules, model sizes, and imbalance ra
- The paper claims no performance degradation on high-resource tasks (HRTs), but Figure 4 shows a significant drop for the "Pick/Place" HRT (0.21 to 0.12). This trade-off is not discussed. - The method requires pre-segmented tasks with known counts ($|D_i|$), making it hard to apply directly to unsegmented "in-the-wild" datasets.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
