Object-Centric Latent Action Learning
Albina Klepach, Alexander Nikulin, Ilya Zisman, Denis Tarasov, Alexander Derevyagin, Andrei Polubarov, Nikita Lyubaykin, Igor Kiselev, Vladislav Kurenkov

TL;DR
This paper introduces an object-centric latent action learning framework that improves the robustness of proxy action labels in visually complex environments by disentangling object movements from distractors, enhancing imitation learning.
Contribution
The proposed method leverages self-supervised object-centric pretraining to improve latent action inference amidst distractors, outperforming previous approaches in complex visual tasks.
Findings
Object-centric pretraining reduces distractor effects by 50%.
Improved task performance in complex environments.
Enhanced imitation learning with fewer labeled trajectories.
Abstract
Leveraging vast amounts of unlabeled internet video data for embodied AI is currently bottlenecked by the lack of action labels and the presence of action-correlated visual distractors. Although recent latent action policy optimization (LAPO) has shown promise in inferring proxy action labels from visual observations, its performance degrades significantly when distractors are present. To address this limitation, we propose a novel object-centric latent action learning framework that centers on objects rather than pixels. We leverage self-supervised object-centric pretraining to disentangle the movement of the agent and distracting background dynamics. This allows LAPO to focus on task-relevant interactions, resulting in more robust proxy-action labels, enabling better imitation learning and efficient adaptation of the agent with just a few action-labeled trajectories. We evaluated our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare
MethodsFocus
