Grounding Large Language Models In Embodied Environment With Imperfect   World Models

Haolan Liu; Jishen Zhao

arXiv:2410.02742·cs.CL·November 13, 2024

Grounding Large Language Models In Embodied Environment With Imperfect World Models

Haolan Liu, Jishen Zhao

PDF

Open Access

TL;DR

This paper introduces GLIMO, a method that grounds large language models in physical environments using imperfect world models and an automated data generation process, significantly improving their physical reasoning and robotics capabilities.

Contribution

The paper presents a novel framework that combines proxy world models with an LLM agent-based data generator to enhance physical reasoning in LLMs, outperforming existing models on multiple benchmarks.

Findings

01

Performance improved by up to 2.04 times on benchmark tasks.

02

LLaMA-3 models surpass larger models like GPT-4 in specific tasks.

03

Automated data generation enhances physical reasoning in LLMs.

Abstract

Despite a widespread success in various applications, large language models (LLMs) often stumble when tackling basic physical reasoning or executing robotics tasks, due to a lack of direct experience with the physical nuances of the real world. To address these issues, we propose a Grounding Large language model with Imperfect world MOdel (GLIMO), which utilizes proxy world models such as simulators to collect and synthesize trining data. GLIMO incorporates an LLM agent-based data generator to automatically create high-quality and diverse instruction datasets. The generator includes an iterative self-refining module for temporally consistent experience sampling, a diverse set of question-answering instruction seeds, and a retrieval-augmented generation module for reflecting on prior experiences. Comprehensive experiments show that our approach improve the performance of strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Sparse Evolutionary Training · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding