Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

Qingrong He; Kejun Lin; Shizhe Chen; Anwen Hu; Qin Jin

arXiv:2404.14705·cs.CV·April 24, 2024

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

Qingrong He, Kejun Lin, Shizhe Chen, Anwen Hu, Qin Jin

PDF

Open Access

TL;DR

This paper introduces LLM-TPC, a novel framework that enhances 3D situated reasoning by integrating large language models with planning, tool use, and reflection, improving accuracy and robustness in complex 3D question-answering tasks.

Contribution

The paper proposes a new LLM-based framework with a Think-Program-Rectify loop for 3D reasoning, addressing data scarcity and generalization issues in existing models.

Findings

01

Demonstrates superior performance on SQA3D benchmark

02

Shows improved interpretability and robustness

03

Validates effectiveness through extensive experiments

Abstract

This work addresses the 3D situated reasoning task which aims to answer questions given egocentric observations in a 3D environment. The task remains challenging as it requires comprehensive 3D perception and complex reasoning skills. End-to-end models trained on supervised data for 3D situated reasoning suffer from data scarcity and generalization ability. Inspired by the recent success of leveraging large language models (LLMs) for visual reasoning, we propose LLM-TPC, a novel framework that leverages the planning, tool usage, and reflection capabilities of LLMs through a ThinkProgram-reCtify loop. The Think phase first decomposes the compositional question into a sequence of steps, and then the Program phase grounds each step to a piece of code and calls carefully designed 3D visual perception modules. Finally, the Rectify phase adjusts the plan and code if the program fails to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies