Reply to Cecchi and Palminteri: On the need to model temporal variation in learning rates
Prakhar Godara

Abstract
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1- —Deutsche Forschungsgemeinschaft (DFG)501100001659
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntergenerational and Educational Inequality Studies · Language and cultural evolution · Advanced Causal Inference Techniques
We appreciate Cecchi and Palminteri’s [CP, (1)] interest in our work (2). We begin by clarifying that (2) does not claim that human behavior is best described by models with decreasing learning rates. Indeed, we explicitly document ways in which human behavior deviates from Bayesian predictions (SI Appendix, section 4). Our central point is methodological: temporal variation in learning rates can mimic confirmation bias. Therefore, failing to account for temporally varying learning rates before introducing valence-based asymmetries risks overestimating the magnitude of bias.
CP challenge our comparison of Bayesian and biased RL models (Fig. 4B), but two methodological choices explain the discrepancy:
- CP fit a model designed for stationary environments (with learning rate ) to both stationary and nonstationary environments.
- CP fit one set of parameters to all episodes of a subject.
Regarding (1), applying a stationary Bayesian update rule to reversal blocks is inappropriate: the model is misspecified by design, so poor fits are expected. Our analysis intentionally restricts itself to stationary environments to isolate the confound between temporal learning-rate variation and confirmation bias. If nonstationary environments are included, Bayesian models with assumed volatility—which also imply time-varying learning rates—provide better fits than asymmetric RL models (3).
Regarding (2), there is no consensus on whether reinforcement-learning parameters should be fit jointly across all episodes of a subject or separately for each episode. Several studies show that RL parameter estimates often exhibit low test–retest reliability (4, 5), and human behavior can vary substantially from block to block due to factors such as attention or fatigue. In our data, parameter reliability across symmetric, asymmetric, and reversal environments is low (Table 1), and cross-validation reveals poor generalization of parameters across environment types (Fig. 1). In light of this variability, we chose to fit parameters separately for each episode, while recognizing that alternative modeling choices are possible.
Finally, CP introduce hybrid models with geometrically decaying, valence-dependent learning rates as evidence that temporal dynamics alone cannot account for behavior. However, these models impose a particular parametric form on learning-rate dynamics. Humans are known to infer environmental volatility, which implies flexible, often nonmonotonic changes in learning rates near change points. Given the wide space of possible temporal dynamics, rejecting one specific form does not negate the broader point: temporal variation must be modeled before attributing behavior to valence-based bias.
All analyses reported here, including parameter reliability and cross-validation tests, are available at: https://github.com/prakhargodara/Bandit-parameters-ICC-and-cross-validation-scores.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1R. Cecchi, S. Palminteri, Genuine learning biases persist after accounting for temporally decreasing learning rates: Insight from fitting six datasets. OSF [Preprint] (2025). 10.31234/osf.io/xyvk 9_v 1 (Accessed 10 December 2025).41632846 · doi ↗ · pubmed ↗
- 2P. Godara, Apparent learning biases emerge from optimal inference: Insights from master equation analysis. Proc. Natl. Acad. Sci. U.S.A. 122, e 2502761122 (2025).41066112 10.1073/pnas.2502761122 PMC 12541305 · doi ↗ · pubmed ↗
- 3C. Y. Zhou, D. Guo, J. Y. Angela, “Devaluation of unchosen options: A Bayesian account of the provenance and maintenance of overly optimistic expectations” in Cog Sci 2000, Annual Conference of the Cognitive Science Society (Cognitive Science Society, US, 2020). vol. 42, p. 1682.PMC 833642934355220 · pubmed ↗
- 4M. Waltmann, F. Schlagenhauf, L. Deserno, Sufficient reliability of the behavioral and computational readouts of a probabilistic reversal learning task. Behav. Res. Methods 54, 2993–3014 (2022).35167111 10.3758/s 13428-021-01739-7PMC 9729159 · doi ↗ · pubmed ↗
- 5J. V. Schaaf, L. Weidinger, L. Molleman, W. van den Bos, Test-retest reliability of reinforcement learning parameters. Behav. Res. Methods 56, 4582–4599 (2024).37684495 10.3758/s 13428-023-02203-4PMC 11289054 · doi ↗ · pubmed ↗
