Object-Centric Latent Action Learning

Albina Klepach; Alexander Nikulin; Ilya Zisman; Denis Tarasov; Alexander Derevyagin; Andrei Polubarov; Nikita Lyubaykin; Igor Kiselev; Vladislav Kurenkov

arXiv:2502.09680·cs.CV·January 21, 2026

Object-Centric Latent Action Learning

Albina Klepach, Alexander Nikulin, Ilya Zisman, Denis Tarasov, Alexander Derevyagin, Andrei Polubarov, Nikita Lyubaykin, Igor Kiselev, Vladislav Kurenkov

PDF

Open Access 1 Video

TL;DR

This paper introduces an object-centric latent action learning framework that improves the robustness of proxy action labels in visually complex environments by disentangling object movements from distractors, enhancing imitation learning.

Contribution

The proposed method leverages self-supervised object-centric pretraining to improve latent action inference amidst distractors, outperforming previous approaches in complex visual tasks.

Findings

01

Object-centric pretraining reduces distractor effects by 50%.

02

Improved task performance in complex environments.

03

Enhanced imitation learning with fewer labeled trajectories.

Abstract

Leveraging vast amounts of unlabeled internet video data for embodied AI is currently bottlenecked by the lack of action labels and the presence of action-correlated visual distractors. Although recent latent action policy optimization (LAPO) has shown promise in inferring proxy action labels from visual observations, its performance degrades significantly when distractors are present. To address this limitation, we propose a novel object-centric latent action learning framework that centers on objects rather than pixels. We leverage self-supervised object-centric pretraining to disentangle the movement of the agent and distracting background dynamics. This allows LAPO to focus on task-relevant interactions, resulting in more robust proxy-action labels, enabling better imitation learning and efficient adaptation of the agent with just a few action-labeled trajectories. We evaluated our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Object-Centric Latent Action Learning· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare

MethodsFocus