DISCO: Embodied Navigation and Interaction via Differentiable Scene   Semantics and Dual-level Control

Xinyu Xu; Shengcheng Luo; Yanchao Yang; Yong-Lu Li; Cewu Lu

arXiv:2407.14758·cs.CV·July 23, 2024

DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control

Xinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li, Cewu Lu

PDF

Open Access 1 Repo

TL;DR

DISCO introduces a novel embodied AI framework that combines differentiable scene semantics and dual-level control to improve navigation and interaction tasks in complex environments, significantly outperforming previous methods.

Contribution

The paper presents DISCO, a new approach that integrates dynamic scene semantics and hierarchical control for embodied agents, advancing the state-of-the-art in mobile manipulation and instruction following.

Findings

01

DISCO achieves +8.6% success rate improvement in unseen scenes.

02

It effectively models rich scene semantics for better navigation planning.

03

The dual-level control enhances task efficiency and accuracy.

Abstract

Building a general-purpose intelligent home-assistant agent skilled in diverse tasks by human commands is a long-term blueprint of embodied AI research, which poses requirements on task planning, environment modeling, and object interaction. In this work, we study primitive mobile manipulations for embodied agents, i.e. how to navigate and interact based on an instructed verb-noun pair. We propose DISCO, which features non-trivial advancements in contextualized scene modeling and efficient controls. In particular, DISCO incorporates differentiable scene representations of rich semantics in object and affordance, which is dynamically learned on the fly and facilitates navigation planning. Besides, we propose dual-level coarse-to-fine action controls leveraging both global and local cues to accomplish mobile manipulation tasks efficiently. DISCO easily integrates into embodied tasks such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenxuuu/disco
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications