TL;DR
MTIL introduces a novel approach using State Space Models to encode entire observation histories efficiently, enabling long-horizon imitation learning in robotics beyond the limitations of traditional Markovian methods.
Contribution
This paper presents MTIL, a new framework combining World Model and Dynamical System concepts with linear-time recurrent dynamics for efficient, full-history encoding in imitation learning.
Findings
MTIL outperforms SOTA methods on simulated benchmarks.
MTIL effectively resolves long-term temporal ambiguities.
The approach is computationally feasible for high-dimensional, long-horizon tasks.
Abstract
Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, which falters in long-horizon tasks where history is crucial for resolving perceptual ambiguity. This limitation stems not only from a conceptual gap but also from a fundamental computational barrier: prevailing architectures like Transformers are often constrained by quadratic complexity, rendering the processing of long, high-dimensional observation sequences infeasible. To overcome this dual challenge, we introduce Mamba Temporal Imitation Learning (MTIL). Our approach represents a new paradigm for robotic learning, which we frame as a practical synthesis of World Model and Dynamical System concepts. By leveraging the linear-time recurrent dynamics of State Space Models (SSMs), MTIL learns an implicit, action-oriented world model that efficiently encodes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
