# The sign of exploration during reward-based motor learning is not independent from trial to trial

**Authors:** Katinka van der Kooij, Jeroen B. J. Smeets, Nina M. van Mastrigt, Bernadette C. M. van Wijk

PMC · DOI: 10.1007/s00221-025-07074-z · Experimental Brain Research · 2025-04-15

## TL;DR

Humans adjust movements after failure in reward-based learning, and their exploration is not purely random from trial to trial.

## Contribution

This study shows that human motor exploration during reward-based learning is not random, challenging existing computational models.

## Key findings

- Participants showed more same-sign trial-to-trial force changes than expected from random exploration.
- Adaptive reward conditions aligned with low-random-exploration models, while fixed reward conditions did not.
- Non-random exploration contributes to reward-based motor learning.

## Abstract

Humans can learn various motor tasks based on binary reward feedback on whether a movement attempt was successful or not. Such ‘reward-based motor learning’ relies on exploiting successful motor commands and exploring different motor commands following failure. Most computational models of reward-based motor learning have formalized exploration as a random process, in which on each trial a random draw is taken from a normal distribution centred on zero. Whether human motor exploration is indeed random from trial to trial has not been tested yet. Here we tested in a force production task whether human motor exploration is random. To this end, we compared the proportion trial-to-trial force changes in the behavioural data that have the same sign to the proportion expected in random exploration. One group of participants practiced with an adaptive reward criterion, which keeps rewarded performance close to current performance, and the other group practiced with a fixed reward criterion in which current performance can be far from reward performance. In both groups, we found a proportion same-sign changes larger than predicted. In the Adaptive group, both the learning and proportion same-sign changes were consistent with model simulations for low values of random exploration, whereas in the Fixed group both the learning and proportion same-sign changes were inconsistent with model simulations based on random exploration. This suggests that some form of non-random motor exploration contributes to reward-based motor learning.

The online version contains supplementary material available at 10.1007/s00221-025-07074-z.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12000264/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12000264/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12000264/full.md

---
Source: https://tomesphere.com/paper/PMC12000264