Object-Centric Action-Enhanced Representations for Robot Visuo-Motor Policy Learning

Nikos Giannakakis; Argyris Manetas; Panagiotis P. Filntisis; Petros Maragos; George Retsinas

arXiv:2505.20962·cs.RO·May 28, 2025

Object-Centric Action-Enhanced Representations for Robot Visuo-Motor Policy Learning

Nikos Giannakakis, Argyris Manetas, Panagiotis P. Filntisis, Petros Maragos, George Retsinas

PDF

Open Access

TL;DR

This paper introduces an object-centric encoder using Slot Attention and pretrained models to improve robot visuo-motor learning by integrating semantic segmentation and visual representation, reducing the need for robot-specific datasets.

Contribution

The work presents a novel integrated object-centric encoder that combines semantic segmentation with visual representation learning, leveraging pretrained models and fine-tuning on human action videos.

Findings

01

Pretrained models on out-of-domain datasets benefit robot learning.

02

Fine-tuning on human action videos improves performance.

03

Integrated segmentation and encoding enhance reinforcement and imitation learning.

Abstract

Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by psychological theories suggesting that humans process scenes in an object-based fashion, we propose an object-centric encoder that performs semantic segmentation and visual representation generation in a coupled manner, unlike other works, which treat these as separate processes. To achieve this, we leverage the Slot Attention mechanism and use the SOLV model, pretrained in large out-of-domain datasets, to bootstrap fine-tuning on human action video data. Through simulated robotic tasks, we demonstrate that visual representations can enhance reinforcement and imitation learning training, highlighting the effectiveness of our integrated approach for semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSoftmax · Attention Is All You Need