# LLMs augmented hierarchical reinforcement learning with action primitives for long-horizon manipulation tasks

**Authors:** Ning Zhang, Yongjia Zhao, Minghao Yang, Shuling Dai

PMC · DOI: 10.1038/s41598-025-20653-y · Scientific Reports · 2025-10-21

## TL;DR

This paper introduces a method combining large language models with reinforcement learning to improve performance on complex, long-term manipulation tasks.

## Contribution

A novel hierarchical agent (LARAP) that integrates LLMs with action primitives for efficient learning in long-horizon tasks.

## Key findings

- LARAP significantly outperforms baseline methods in simulated manipulation tasks.
- The approach improves sample efficiency by using LLMs to guide high-level policy learning.
- Hierarchical decomposition with LLMs enhances training efficiency and task transferability.

## Abstract

Deep reinforcement learning methods have shown promising results in learning specific tasks, but struggle to cope with the challenges of long horizon manipulation tasks. As task complexity increases, the large state space and sparse reward make it difficult to collect effective samples through random exploration. Hierarchical reinforcement learning decomposes complex tasks into subtasks, which can reduce the difficulty of skill learning, but still suffers from limitations such as inefficient training and poor transferability. Recently, large language models (LLMs) have demonstrated the ability to encode vast amounts of knowledge about the world and to excel in context-based learning and reasoning tasks. However, applying LLMs to real-world tasks remains challenging due to their lack of grounding in specific task contexts. In this paper, we leverage the planning capabilities of LLMs alongside reinforcement learning (RL) to facilitate learning from the environment. The proposed approach yields a hierarchical agent that combines LLMs with parameterized action primitives (LARAP) to address long-horizon manipulation tasks. Rather than relying solely on LLMs, the agent uses them to guide a high-level policy, improving sample efficiency during training. Experimental results show that LARAP significantly outperforms baseline methods across various simulated manipulation tasks. The source code is available at: https://github.com/ningzhang-buaa/LARAP-code.

## Full-text entities

- **Diseases:** LLM (MESH:D007806), LARAP (MESH:D009207), HRL (MESH:D007859)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12540720/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12540720/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12540720/full.md

---
Source: https://tomesphere.com/paper/PMC12540720