SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

Taisei Hanyu; Nhat Chung; Huy Le; Toan Nguyen; Yuki Ikebe; Anthony Gunderman; Duy Nguyen Ho Minh; Khoa Vo; Tung Kieu; Kashu Yamazaki; Chase Rainwater; Anh Nguyen; Ngan Le

arXiv:2511.06754·cs.RO·May 7, 2026

SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

Taisei Hanyu, Nhat Chung, Huy Le, Toan Nguyen, Yuki Ikebe, Anthony Gunderman, Duy Nguyen Ho Minh, Khoa Vo, Tung Kieu, Kashu Yamazaki, Chase Rainwater, Anh Nguyen, Ngan Le

PDF

TL;DR

This paper introduces LIBERO+ dataset and SlotVLA framework for structured, interpretable object-relation reasoning in robotic manipulation, improving efficiency and generalization.

Contribution

It presents a new dataset with object-centric annotations and a slot-attention-based model for capturing object and relation representations in manipulation tasks.

Findings

01

SlotVLA reduces visual tokens needed for manipulation

02

Object-relation slot representations improve generalization

03

LIBERO+ enables evaluation of object-relation reasoning

Abstract

Inspired by how humans reason over discrete objects and their relationships, we explore whether compact object-centric and object-relation representations can form a foundation for multitask robotic manipulation. Most existing robotic multitask models rely on dense embeddings that entangle both object and background cues, raising concerns about both efficiency and interpretability. In contrast, we study object-relation-centric representations as a pathway to more structured, efficient, and explainable visuomotor control. Our contributions are two-fold. First, we introduce LIBERO+, a fine-grained benchmark dataset designed to enable and evaluate object-relation reasoning in robotic manipulation. Unlike prior datasets, LIBERO+ provides object-centric annotations that enrich demonstrations with box- and mask-level labels as well as instance-level temporal tracking, supporting compact and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.