# Estimating the motor exploration in reinforcement learning

**Authors:** Anja T. Zai, Corinna Lorenz, Shakana Srikantharajah, Nicolas Giret, Richard H.R. Hahnloser

PMC · DOI: 10.1016/j.isci.2025.114398 · 2025-12-09

## TL;DR

The paper introduces a new method to estimate motor exploration in reinforcement learning, inspired by brain organization and tested in songbirds.

## Contribution

A latent RL agent is introduced that estimates optimal exploration and matches brain-generated motor variability in learning.

## Key findings

- Latent RL aligns with non-optimal learning in songbirds and humans.
- Exploration can be estimated from single-trial behavioral data.
- Estimated explorations match brain variability driving learning.

## Abstract

What exploration strategies do animals use to learn motor skills? Reinforcement learning (RL) theory is a powerful framework to study motor learning, but provides no guidance for estimating motor exploration – the behavioral component aimed at discovering better strategies. We address this gap by taking inspiration from the brain’s modular organization and postulating a latent learner that explores by injecting an additive source of ideal randomness into behavior. Assuming the learner is ignorant of other motor components, evolutionary fitness argues that these should display mainly non-ideal variability.

We verify this behavioral decomposition in songbirds undergoing vocal pitch conditioning. The estimated vocal explorations account for the motor contribution of a cortico-basal ganglia pathway, while other components capture birds’ suboptimal learning trajectories. Latent RL therefore provides a normative improvement over classical RL, making exploration explicit and suggesting that evolutionary pressure favors the randomness of exploration over strict behavioral optimality.

•We introduce a latent RL agent that learns optimally from its exploration•Latent RL agrees with non-optimal learning trajectories in songbirds and humans•Explorations can be estimated from behavior on a single-trial basis•Estimated explorations match brain-generated motor variability driving learning

We introduce a latent RL agent that learns optimally from its exploration

Latent RL agrees with non-optimal learning trajectories in songbirds and humans

Explorations can be estimated from behavior on a single-trial basis

Estimated explorations match brain-generated motor variability driving learning

Neuroscience; Sensory neuroscience; Cognitive neuroscience

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12829137/full.md

---
Source: https://tomesphere.com/paper/PMC12829137