Knowledge-based Embodied Question Answering
Sinan Tan, Mengmeng Ge, Di Guo, Huaping Liu, Fuchun Sun

TL;DR
This paper introduces a Knowledge-based Embodied Question Answering task where an agent uses external knowledge and scene understanding to answer complex questions in a 3D environment, improving multi-turn reasoning and multi-agent applicability.
Contribution
It presents a novel K-EQA task and a neural program synthesis framework that combines external knowledge with scene graph reasoning for improved question answering.
Findings
The framework effectively answers complex, realistic questions in embodied environments.
The approach enhances multi-turn question answering efficiency using scene graph memory.
It is applicable to multi-agent scenarios, demonstrating versatility.
Abstract
In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from explicitly specifying the target object in the question as existing EQA work, the agent can resort to external knowledge to understand more complicated question such as "Please tell me what are objects used to cut food in the room?", in which the agent must know the knowledge such as "knife is used for cutting food". To address this K-EQA problem, a novel framework based on neural program synthesis reasoning is proposed, where the joint reasoning of the external knowledge and 3D scene graph is performed to realize navigation and question answering. Especially, the 3D scene graph can provide the memory to store the visual information of visited scenes, which significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
