STORM: Slot-based Task-aware Object-centric Representation for robotic Manipulation
Alexandre Chapin (LIRIS), Emmanuel Dellandr\'ea (LIRIS), Liming Chen (LIRIS)

TL;DR
STORM introduces a multi-phase training method to adapt frozen visual foundation models into task-aware, object-centric representations, improving robotic manipulation robustness and generalization without retraining large models.
Contribution
The paper presents STORM, a lightweight, multi-phase adaptation module that enhances frozen foundation models with semantic-aware slots for better robotic manipulation.
Findings
Improves generalization to visual distractors in manipulation tasks.
Enhances control performance over end-to-end trained models.
Maintains semantic consistency during adaptation.
Abstract
Visual foundation models provide strong perceptual features for robotics, but their dense representations lack explicit object-level structure, limiting robustness and contractility in manipulation tasks. We propose STORM (Slot-based Task-aware Object-centric Representation for robotic Manipulation), a lightweight object-centric adaptation module that augments frozen visual foundation models with a small set of semantic-aware slots for robotic manipulation. Rather than retraining large backbones, STORM employs a multi-phase training strategy: object-centric slots are first stabilized through visual--semantic pretraining using language embeddings, then jointly adapted with a downstream manipulation policy. This staged learning prevents degenerate slot formation and preserves semantic consistency while aligning perception with task objectives. Experiments on object discovery benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Social Robot Interaction and HRI
