Learning Long-Context Diffusion Policies via Past-Token Prediction
Marcel Torne, Andy Tang, Yuejiang Liu, Chelsea Finn

TL;DR
This paper introduces Past-Token Prediction (PTP), a regularization technique for long-context diffusion policies that enhances temporal modeling, reduces memory costs, and improves performance in robotic tasks.
Contribution
The paper proposes PTP, an auxiliary task for better temporal dependency learning, along with a multistage training strategy and self-verification at inference, advancing long-context policy learning.
Findings
Improves long-context diffusion policy performance by 3x.
Reduces training time by over 10x.
Enhances temporal modeling with minimal visual representation reliance.
Abstract
Reasoning over long sequences of observations and actions is essential for many robotic tasks. Yet, learning effective long-context policies from demonstrations remains challenging. As context length increases, training becomes increasingly expensive due to rising memory demands, and policy performance often degrades as a result of spurious correlations. Recent methods typically sidestep these issues by truncating context length, discarding historical information that may be critical for subsequent decisions. In this paper, we propose an alternative approach that explicitly regularizes the retention of past information. We first revisit the copycat problem in imitation learning and identify an opposite challenge in recent diffusion policies: rather than over-relying on prior actions, they often fail to capture essential dependencies between past and future actions. To address this, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsDiffusion
