Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models

Bahram Mohammadi; Ehsan Abbasnejad; Yuankai Qi; Qi Wu; Anton Van Den Hengel; and Javen Qinfeng Shi

arXiv:2505.07500·cs.CV·May 13, 2025

Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models

Bahram Mohammadi, Ehsan Abbasnejad, Yuankai Qi, Qi Wu, Anton Van Den Hengel, and Javen Qinfeng Shi

PDF

Open Access

TL;DR

This paper introduces PEAP-LLM, a parameter-efficient, two-module approach using large language models for goal-oriented navigation in complex indoor environments, significantly improving performance on the REVERIE task.

Contribution

It presents a novel two-stage fine-tuning method for LLMs and a modular action planning framework for embodied navigation without pre-exploration.

Findings

01

PEAP-LLM outperforms previous state-of-the-art on REVERIE.

02

The two-stage fine-tuning improves instruction quality and environmental adaptability.

03

Modular LLM-based planning enhances navigation success in complex scenarios.

Abstract

The remote embodied referring expression (REVERIE) task requires an agent to navigate through complex indoor environments and localize a remote object specified by high-level instructions, such as "bring me a spoon", without pre-exploration. Hence, an efficient navigation plan is essential for the final success. This paper proposes a novel parameter-efficient action planner using large language models (PEAP-LLM) to generate a single-step instruction at each location. The proposed model consists of two modules, LLM goal planner (LGP) and LoRA action planner (LAP). Initially, LGP extracts the goal-oriented plan from REVERIE instructions, including the target object and room. Then, LAP generates a single-step instruction with the goal-oriented plan, high-level instruction, and current visual observation as input. PEAP-LLM enables the embodied agent to interact with LAP as the path planner…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Social Robot Interaction and HRI

MethodsShrink and Fine-Tune · Direct Preference Optimization