# Enhanced exploration in reinforcement learning using graph neural network based intrinsic reward mechanism

**Authors:** J. Arun Pandian, Ramkumar Thirunavukarasu, Rajganesh Nagarajan

PMC · DOI: 10.1038/s41598-025-23769-3 · Scientific Reports · 2025-11-14

## TL;DR

This paper introduces a new reinforcement learning framework using graph neural networks to improve exploration efficiency in discrete environments.

## Contribution

A novel intrinsic reward mechanism using graph neural networks to model state relationships and guide exploration in reinforcement learning.

## Key findings

- GNN-IRL outperforms existing exploration strategies in convergence rate and cumulative reward.
- The method improves exploration efficiency and state coverage in discrete action spaces.
- Results show enhanced sample efficiency and faster policy learning in benchmark environments.

## Abstract

We propose a Graph Neural Network-based Intrinsic Reward Learning (GNN-IRL) framework to address the exploration–exploitation trade-off in Reinforcement Learning (RL). This approach leverages the structural modeling capabilities of Graph Neural Networks (GNNs) to represent the transitions and relationships between states in the environment. Intrinsic rewards are computed based on centrality measures and inverse degree analysis within the state graph, enabling the agent to identify and explore novel or under-visited states. The effectiveness of GNN-IRL was validated in four benchmark environments with discrete action spaces: CartPole-v1, MountainCar-v0, Taxi-v3, and LunarLander-v3. Continuous state variables were discretized to construct state graphs, which facilitates the implementation of GNN-IRL but may limit scalability to very high-dimensional continuous spaces. The experimental results show that GNN-IRL outperforms state-of-the-art extrinsic and intrinsic exploration strategies in terms of convergence rate, cumulative reward, exploration efficiency, and state coverage. These findings demonstrate that GNN-IRL effectively balances exploration and exploitation, thereby improving sample efficiency and accelerating policy learning in discretized RL domains.

## Full-text entities

- **Genes:** TTC41P (tetratricopeptide repeat domain 41, pseudogene) [NCBI Gene 253724] {aka GNN, GNNP}
- **Chemicals:** CartPole (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12618532/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12618532/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/PMC12618532/full.md

---
Source: https://tomesphere.com/paper/PMC12618532