Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models
Yew Ken Chia, Qi Sun, Lidong Bing, Soujanya Poria

TL;DR
This paper introduces Can-Do, a challenging benchmark dataset for evaluating embodied planning in multimodal models, and proposes NeuroGround, a neuro-symbolic framework to improve planning by grounding in perceived environment states.
Contribution
The paper presents a new dataset for embodied planning and a neuro-symbolic framework that enhances planning accuracy by grounding in environment perception.
Findings
State-of-the-art models struggle with perception and reasoning in embodied tasks.
NeuroGround improves planning performance over baseline models.
The dataset enables evaluation of commonsense, physical understanding, and safety in multimodal models.
Abstract
Large multimodal models have demonstrated impressive problem-solving abilities in vision and language tasks, and have the potential to encode extensive world knowledge. However, it remains an open challenge for these models to perceive, reason, plan, and act in realistic environments. In this work, we introduce Can-Do, a benchmark dataset designed to evaluate embodied planning abilities through more diverse and complex scenarios than previous datasets. Our dataset includes 400 multimodal samples, each consisting of natural language user instructions, visual images depicting the environment, state changes, and corresponding action plans. The data encompasses diverse aspects of commonsense knowledge, physical understanding, and safety awareness. Our fine-grained analysis reveals that state-of-the-art models, including GPT-4V, face bottlenecks in visual perception, comprehension, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning
