Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration

Danish Rizvi; David Boyle

arXiv:2603.03595·cs.LG·March 5, 2026

Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration

Danish Rizvi, David Boyle

PDF

Open Access

TL;DR

This paper introduces a hybrid belief-reinforcement learning framework for multi-agent spatial exploration, combining model-based spatial belief estimation with deep RL for efficient, cooperative coverage and faster convergence.

Contribution

It proposes a novel hybrid approach that integrates spatial belief modeling with RL, enabling efficient, cooperative exploration with transfer learning techniques.

Findings

01

Achieved 10.8% higher cumulative reward over baselines.

02

Faster convergence by 38% compared to existing methods.

03

Dual-channel transfer improves exploration efficiency.

Abstract

Coordinating multiple autonomous agents to explore and serve spatially heterogeneous demand requires jointly learning unknown spatial patterns and planning trajectories that maximize task performance. Pure model-based approaches provide structured uncertainty estimates but lack adaptive policy learning, while deep reinforcement learning often suffers from poor sample efficiency when spatial priors are absent. This paper presents a hybrid belief-reinforcement learning (HBRL) framework to address this gap. In the first phase, agents construct spatial beliefs using a Log-Gaussian Cox Process (LGCP) and execute information-driven trajectories guided by a Pathwise Mutual Information (PathMI) planner with multi-step lookahead. In the second phase, trajectory control is transferred to a Soft Actor-Critic (SAC) agent, warm-started through dual-channel knowledge transfer: belief state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUAV Applications and Optimization · Age of Information Optimization · Reinforcement Learning in Robotics