# Information-Theoretic Intrinsic Motivation for Reinforcement Learning in Combinatorial Routing

**Authors:** Ruozhang Xi, Yao Ni, Wangyu Wu

PMC · DOI: 10.3390/e28020140 · 2026-01-27

## TL;DR

This paper introduces a new intrinsic motivation method for reinforcement learning that improves exploration in complex routing problems using information theory.

## Contribution

A novel information-theoretic framework for intrinsic motivation using the Information Bottleneck principle in combinatorial state spaces.

## Key findings

- The method improves exploration efficiency in high-dimensional routing problems.
- It achieves better training stability and solution quality compared to standard RL baselines.
- Neural mutual information estimators enable scalable implementation without explicit density modeling.

## Abstract

Intrinsic motivation provides a principled mechanism for driving exploration in reinforcement learning when external rewards are sparse or delayed. A central challenge, however, lies in defining meaningful novelty signals in high-dimensional and combinatorial state spaces, where observation-level density estimation and prediction-error heuristics often become unreliable. In this work, we propose an information-theoretic framework for intrinsically motivated reinforcement learning grounded in the Information Bottleneck principle. Our approach learns compact latent state representations by explicitly balancing the compression of observations and the preservation of predictive information about future state transitions. Within this bottlenecked latent space, intrinsic rewards are defined through information-theoretic quantities that characterize the novelty of state–action transitions in terms of mutual information, rather than raw observation dissimilarity. To enable scalable estimation in continuous and high-dimensional settings, we employ neural mutual information estimators that avoid explicit density modeling and contrastive objectives based on the construction of positive–negative pairs. We evaluate the proposed method on two representative combinatorial routing problems, the Travelling Salesman Problem and the Split Delivery Vehicle Routing Problem, formulated as Markov decision processes with sparse terminal rewards. These problems serve as controlled testbeds for studying exploration and representation learning under long-horizon decision making. Experimental results demonstrate that the proposed information bottleneck-driven intrinsic motivation improves exploration efficiency, training stability, and solution quality compared to standard reinforcement learning baselines.

## Full-text entities

- **Genes:** THBS1 (thrombospondin 1) [NCBI Gene 7057] {aka THBS, THBS-1, TSP, TSP-1, TSP1}
- **Diseases:** injury to (MESH:D014947), PPO (MESH:D014897)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12939797/full.md

---
Source: https://tomesphere.com/paper/PMC12939797