Task-Oriented Hierarchical Object Decomposition for Visuomotor Control
Jianing Qian, Yunshuang Li, Bernadette Bucher, Dinesh Jayaraman

TL;DR
HODOR introduces a hierarchical, scene-entity-based representation that improves sample efficiency and generalization in visuomotor control tasks by selectively assembling task-specific features.
Contribution
The paper presents HODOR, a novel hierarchical object decomposition method that scales representations with scene complexity and enhances task-specific learning.
Findings
HODOR outperforms prior representations in imitation learning tasks.
HODOR's invariances enable robust zero-shot generalization.
HODOR scales with scene and task complexity.
Abstract
Good pre-trained visual representations could enable robots to learn visuomotor policy efficiently. Still, existing representations take a one-size-fits-all-tasks approach that comes with two important drawbacks: (1) Being completely task-agnostic, these representations cannot effectively ignore any task-irrelevant information in the scene, and (2) They often lack the representational capacity to handle unconstrained/complex real-world scenes. Instead, we propose to train a large combinatorial family of representations organized by scene entities: objects and object parts. This hierarchical object decomposition for task-oriented representations (HODOR) permits selectively assembling different representations specific to each task while scaling in representational capacity with the complexity of the scene and the task. In our experiments, we find that HODOR outperforms prior pre-trained…
Peer Reviews
Decision·CoRL 2024
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition
