BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames
Max Sobol Mark, Jacky Liang, Maria Attarian, Chuyuan Fu, Debidatta Dwibedi, Dhruv Shah, Aviral Kumar

TL;DR
This paper introduces Big Picture Policies (BPP), a method that improves robot imitation learning by focusing on key history frames identified by a vision-language model, enhancing generalization in history-dependent tasks.
Contribution
BPP is a novel approach that conditions on keyframes to address history coverage issues, reducing distribution shift and improving success rates in complex tasks.
Findings
BPP achieves 70% higher success rates on real-world tasks.
Focusing on keyframes reduces distribution shift between training and deployment.
BPP outperforms existing methods across multiple manipulation tasks.
Abstract
Many robot tasks require attending to the history of past observations. For example, finding an item in a room requires remembering which places have already been searched. However, the best-performing robot policies typically condition only on the current observation, limiting their applicability to such tasks. Naively conditioning on past observations often fails due to spurious correlations: policies latch onto incidental features of training histories that do not generalize to out-of-distribution trajectories upon deployment. We analyze why policies latch onto these spurious correlations and find that this problem stems from limited coverage over the space of possible histories during training, which grows exponentially with horizon. Existing regularization techniques provide inconsistent benefits across tasks, as they do not fundamentally address this coverage problem. Motivated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
