Scene Graph for Embodied Exploration in Cluttered Scenario
Yuhong Deng, Qie Sima, Di Guo, Huaping Liu, Yi Wang, Fuchun Sun

TL;DR
This paper introduces a scene graph-based framework for embodied exploration in cluttered environments, enabling robots to understand and manipulate objects through active exploration and semantic reasoning, validated on manipulation question answering tasks.
Contribution
It presents a novel scene graph approach combined with imitation learning and VQA models for semantic understanding in cluttered scenarios, addressing a gap in robotic exploration and manipulation.
Findings
Effective in MQA tasks with cluttered environments
Demonstrates improved semantic understanding during exploration
Validates the approach's applicability to real-world robotic tasks
Abstract
The ability to handle objects in cluttered environment has been long anticipated by robotic community. However, most of works merely focus on manipulation instead of rendering hidden semantic information in cluttered objects. In this work, we introduce the scene graph for embodied exploration in cluttered scenarios to solve this problem. To validate our method in cluttered scenario, we adopt the Manipulation Question Answering (MQA) tasks as our test benchmark, which requires an embodied robot to have the active exploration ability and semantic understanding ability of vision and language.As a general solution framework to the task, we propose an imitation learning method to generate manipulations for exploration. Meanwhile, a VQA model based on dynamic scene graph is adopted to comprehend a series of RGB frames from wrist camera of manipulator along with every step of manipulation is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
