Grounding Large Language Models In Embodied Environment With Imperfect World Models
Haolan Liu, Jishen Zhao

TL;DR
This paper introduces GLIMO, a method that grounds large language models in physical environments using imperfect world models and an automated data generation process, significantly improving their physical reasoning and robotics capabilities.
Contribution
The paper presents a novel framework that combines proxy world models with an LLM agent-based data generator to enhance physical reasoning in LLMs, outperforming existing models on multiple benchmarks.
Findings
Performance improved by up to 2.04 times on benchmark tasks.
LLaMA-3 models surpass larger models like GPT-4 in specific tasks.
Automated data generation enhances physical reasoning in LLMs.
Abstract
Despite a widespread success in various applications, large language models (LLMs) often stumble when tackling basic physical reasoning or executing robotics tasks, due to a lack of direct experience with the physical nuances of the real world. To address these issues, we propose a Grounding Large language model with Imperfect world MOdel (GLIMO), which utilizes proxy world models such as simulators to collect and synthesize trining data. GLIMO incorporates an LLM agent-based data generator to automatically create high-quality and diverse instruction datasets. The generator includes an iterative self-refining module for temporally consistent experience sampling, a diverse set of question-answering instruction seeds, and a retrieval-augmented generation module for reflecting on prior experiences. Comprehensive experiments show that our approach improve the performance of strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Sparse Evolutionary Training · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding
