TL;DR
GLIDER introduces a hierarchical framework that improves long-horizon decision-making in large language models by decomposing tasks into sub-tasks, enhancing exploration, learning, and adaptability in complex environments.
Contribution
The paper presents a novel offline hierarchical reinforcement learning framework for LLMs, enabling efficient decision-making through task decomposition and transferability of low-level skills.
Findings
Achieves consistent performance improvements on ScienceWorld and ALFWorld benchmarks.
Enhances generalization and adaptation to non-stationary environments.
Provides a parameter-efficient hierarchy for LLM policies.
Abstract
While showing sophisticated reasoning abilities, large language models (LLMs) still struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment, especially in sparse-reward scenarios. Inspired by the divide-and-conquer principle, we propose an innovative framework **GLIDER** (**G**rounding **L**anguage Models as Eff**I**cient **D**ecision-Making Agents via Offline Hi**E**rarchical **R**einforcement Learning) that introduces a parameter-efficient and generally applicable hierarchy to LLM policies. We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy. This design decomposes complicated problems into a series of coherent chain-of-thought reasoning sub-tasks, providing flexible temporal abstraction to significantly enhance exploration and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
