Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Zican Hu; Wei Liu; Xiaoye Qu; Xiangyu Yue; Chunlin Chen; Zhi Wang; Yu Cheng

arXiv:2505.19761·cs.AI·May 27, 2025

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang, Yu Cheng

PDF

1 Repo

TL;DR

GLIDER introduces a hierarchical framework that improves long-horizon decision-making in large language models by decomposing tasks into sub-tasks, enhancing exploration, learning, and adaptability in complex environments.

Contribution

The paper presents a novel offline hierarchical reinforcement learning framework for LLMs, enabling efficient decision-making through task decomposition and transferability of low-level skills.

Findings

01

Achieves consistent performance improvements on ScienceWorld and ALFWorld benchmarks.

02

Enhances generalization and adaptation to non-stationary environments.

03

Provides a parameter-efficient hierarchy for LLM policies.

Abstract

While showing sophisticated reasoning abilities, large language models (LLMs) still struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment, especially in sparse-reward scenarios. Inspired by the divide-and-conquer principle, we propose an innovative framework **GLIDER** (**G**rounding **L**anguage Models as Eff**I**cient **D**ecision-Making Agents via Offline Hi**E**rarchical **R**einforcement Learning) that introduces a parameter-efficient and generally applicable hierarchy to LLM policies. We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy. This design decomposes complicated problems into a series of coherent chain-of-thought reasoning sub-tasks, providing flexible temporal abstraction to significantly enhance exploration and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nju-rl/glider
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.