Notes on the Reward Representation of Posterior Updates

Pedro A. Ortega

arXiv:2602.02912·cs.LG·February 4, 2026

Notes on the Reward Representation of Posterior Updates

Pedro A. Ortega

PDF

Open Access

TL;DR

This paper explores when decision-making updates in control and reinforcement learning can be interpreted as true Bayesian posteriors, revealing how such updates influence behavior and the ambiguity of absolute rewards.

Contribution

It provides a theoretical analysis of when KL-regularized soft updates can be exactly Bayesian posteriors within a fixed model, clarifying their informational and behavioral implications.

Findings

01

Posterior updates determine relative, context-dependent incentives.

02

Absolute rewards remain ambiguous up to baseline adjustments.

03

Reusing continuation values links reward descriptions across different update sequences.

Abstract

Many ideas in modern control and reinforcement learning treat decision-making as inference: start from a baseline distribution and update it when a signal arrives. We ask when this can be made literal rather than metaphorical. We study the special case where a KL-regularized soft update is exactly a Bayesian posterior inside a single fixed probabilistic model, so the update variable is a genuine channel through which information is transmitted. In this regime, behavioral change is driven only by evidence carried by that channel: the update must be explainable as an evidence reweighing of the baseline. This yields a sharp identification result: posterior updates determine the relative, context-dependent incentive signal that shifts behavior, but they do not uniquely determine absolute rewards, which remain ambiguous up to context-specific baselines. Requiring one reusable continuation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Motor Control and Adaptation · Advanced Bandit Algorithms Research