# SpikeAEC: a neuromodulation-based spiking controller for explore-exploit balancing in mobile robots

**Authors:** Canyang Liu, Yichen Liu, Yongqi Zhou, Buqin Su

PMC · DOI: 10.3389/fnbot.2026.1757795 · 2026-03-05

## TL;DR

This paper introduces SpikeAEC, a brain-inspired robot controller that improves exploration and exploitation balance using spiking neural networks.

## Contribution

SpikeAEC introduces a neuromodulated spiking architecture that outperforms existing methods in robot control tasks.

## Key findings

- SpikeAEC converges 24% faster than leading brain-inspired methods.
- It reduces trajectory length by 18% and increases cumulative reward by over 5%.
- The design maintains consistency with neurophysiological principles.

## Abstract

Balancing exploration and exploitation remains a fundamental challenge in reliable mobile robot control, as conventional policies often converge on suboptimal behaviors. Inspired by the brain's division of labor for adaptive control, we propose SpikeAEC, a fully spiking, neuromodulated Actor-Explorer-Critic architecture designed to address this dilemma online within a closed-loop system. SpikeAEC comprises three specialized subnetworks operating in parallel: the Actor, inspired by the basal ganglia, proposes exploitative actions; the Explorer, modeled after the ACC-GPe-STN pathway, generates adaptive exploratory actions gated by a vigilance signal modulated by the accumulated global temporal-difference (TD) error; and the Critic, based on the ventral striatum, computes the TD error. The final action is selected by a separate, TAN-based Arbitrator, which probabilistically chooses between the Actor's and Explorer's action proposals according to recent performance and the TD error. These subnetworks are coupled through a unified three-factor learning framework that uses the TD signal and phasic neuromodulators (acetylcholine and dopamine) from the Arbitrator to drive pathway-specific synaptic plasticity. This online plasticity enhances the quality of action proposals and accelerates policy refinement. In simulation, SpikeAEC outperforms leading brain-inspired methods by converging 24% faster, reducing trajectory length by 18%, and increasing cumulative reward by over 5% against the top-performing baseline, all while maintaining consistency with established neurophysiological principles.

## Linked entities

- **Chemicals:** acetylcholine (PubChem CID 187), dopamine (PubChem CID 681)

## Full-text entities

- **Chemicals:** acetylcholine (MESH:D000109), dopamine (MESH:D004298)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12999960/full.md

---
Source: https://tomesphere.com/paper/PMC12999960