LoLA: Long Horizon Latent Action Learning for General Robot Manipulation
Xiaofan Wang, Xingyu Gao, Jianlong Fu, Zuolei Li, Dean Fortier, Galen Mullins, Andrey Kolobov, Baining Guo

TL;DR
LoLA introduces a novel framework for long-horizon, language-guided robot manipulation that integrates multi-view observations and robot proprioception to improve multi-step reasoning and action generation.
Contribution
It proposes a state-aware latent re-representation module that grounds visual and language inputs in physical robot states, enhancing long-horizon manipulation capabilities.
Findings
LoLA outperforms prior methods on simulation benchmarks.
LoLA achieves significant improvements on real-world robotic tasks.
The approach effectively integrates multi-view observations and proprioception.
Abstract
The capability of performing long-horizon, language-guided robotic manipulation tasks critically relies on leveraging historical information and generating coherent action sequences. However, such capabilities are often overlooked by existing Vision-Language-Action (VLA) models. To solve this challenge, we propose LoLA (Long Horizon Latent Action Learning), a framework designed for robot manipulation that integrates long-term multi-view observations and robot proprioception to enable multi-step reasoning and action generation. We first employ Vision-Language Models to encode rich contextual features from historical sequences and multi-view observations. We further introduces a key module, State-Aware Latent Re-representation, which transforms visual inputs and language commands into actionable robot motion space. Unlike existing VLA approaches that merely concatenate robot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
