Learning Additively Compositional Latent Actions for Embodied AI

Hangxing Wei; Xiaoyu Chen; Chuheng Zhang; Tim Pearce; Jianyu Chen; Alex Lamb; Li Zhao; Jiang Bian

arXiv:2604.03340·cs.CV·April 7, 2026

Learning Additively Compositional Latent Actions for Embodied AI

Hangxing Wei, Xiaoyu Chen, Chuheng Zhang, Tim Pearce, Jianyu Chen, Alex Lamb, Li Zhao, Jiang Bian

PDF

TL;DR

This paper introduces AC-LAM, a model that enforces additive compositional structure in latent actions, improving motion representation and policy learning in embodied AI from visual data.

Contribution

AC-LAM is the first to impose additive compositional priors on latent actions, leading to more structured and calibrated motion representations for embodied AI.

Findings

01

AC-LAM outperforms state-of-the-art LAMs in simulated and real-world tabletop tasks.

02

AC-LAM learns more structured, motion-specific, and displacement-calibrated latent actions.

03

Enforcing additive structure improves downstream policy learning.

Abstract

Latent action learning infers pseudo-action labels from visual transitions, providing an approach to leverage internet-scale video for embodied AI. However, most methods learn latent actions without structural priors that encode the additive, compositional structure of physical motion. As a result, latents often entangle irrelevant scene details or information about future observations with true state changes and miscalibrate motion magnitude. We introduce Additively Compositional Latent Action Model (AC-LAM), which enforces scene-wise additive composition structure over short horizons on the latent action space. These AC constraints encourage simple algebraic structure in the latent action space~(identity, inverse, cycle consistency) and suppress information that does not compose additively. Empirically, AC-LAM learns more structured, motion-specific, and displacement-calibrated latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.