Information-Theoretic Intrinsic Motivation for Reinforcement Learning in Combinatorial Routing

Ruozhang Xi; Yao Ni; Wangyu Wu

PMC · DOI:10.3390/e28020140·January 27, 2026

Information-Theoretic Intrinsic Motivation for Reinforcement Learning in Combinatorial Routing

Ruozhang Xi, Yao Ni, Wangyu Wu

PDF

Open Access

TL;DR

This paper introduces a new intrinsic motivation method for reinforcement learning that improves exploration in complex routing problems using information theory.

Contribution

A novel information-theoretic framework for intrinsic motivation using the Information Bottleneck principle in combinatorial state spaces.

Findings

01

The method improves exploration efficiency in high-dimensional routing problems.

02

It achieves better training stability and solution quality compared to standard RL baselines.

03

Neural mutual information estimators enable scalable implementation without explicit density modeling.

Abstract

Intrinsic motivation provides a principled mechanism for driving exploration in reinforcement learning when external rewards are sparse or delayed. A central challenge, however, lies in defining meaningful novelty signals in high-dimensional and combinatorial state spaces, where observation-level density estimation and prediction-error heuristics often become unreliable. In this work, we propose an information-theoretic framework for intrinsically motivated reinforcement learning grounded in the Information Bottleneck principle. Our approach learns compact latent state representations by explicitly balancing the compression of observations and the preservation of predictive information about future state transitions. Within this bottlenecked latent space, intrinsic rewards are defined through information-theoretic quantities that characterize the novelty of state–action transitions in…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

THBS1

Proteins1

Species1

Homo sapiens(human · species)

Diseases2

injury to PPO

Figures4

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Vehicle Routing Optimization Methods · Traffic control and management