LoLA: Long Horizon Latent Action Learning for General Robot Manipulation

Xiaofan Wang; Xingyu Gao; Jianlong Fu; Zuolei Li; Dean Fortier; Galen Mullins; Andrey Kolobov; Baining Guo

arXiv:2512.20166·cs.RO·December 24, 2025

LoLA: Long Horizon Latent Action Learning for General Robot Manipulation

Xiaofan Wang, Xingyu Gao, Jianlong Fu, Zuolei Li, Dean Fortier, Galen Mullins, Andrey Kolobov, Baining Guo

PDF

Open Access

TL;DR

LoLA introduces a novel framework for long-horizon, language-guided robot manipulation that integrates multi-view observations and robot proprioception to improve multi-step reasoning and action generation.

Contribution

It proposes a state-aware latent re-representation module that grounds visual and language inputs in physical robot states, enhancing long-horizon manipulation capabilities.

Findings

01

LoLA outperforms prior methods on simulation benchmarks.

02

LoLA achieves significant improvements on real-world robotic tasks.

03

The approach effectively integrates multi-view observations and proprioception.

Abstract

The capability of performing long-horizon, language-guided robotic manipulation tasks critically relies on leveraging historical information and generating coherent action sequences. However, such capabilities are often overlooked by existing Vision-Language-Action (VLA) models. To solve this challenge, we propose LoLA (Long Horizon Latent Action Learning), a framework designed for robot manipulation that integrates long-term multi-view observations and robot proprioception to enable multi-step reasoning and action generation. We first employ Vision-Language Models to encode rich contextual features from historical sequences and multi-view observations. We further introduces a key module, State-Aware Latent Re-representation, which transforms visual inputs and language commands into actionable robot motion space. Unlike existing VLA approaches that merely concatenate robot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI