Reward Reports for Reinforcement Learning
Thomas Krendl Gilbert, Nathan Lambert, Sarah Dean, Tom Zick, Aaron, Snoswell

TL;DR
This paper introduces Reward Reports, a dynamic documentation framework for reinforcement learning systems that tracks ongoing updates, feedback effects, and societal impacts post-deployment, inspired by ML documentation practices.
Contribution
It proposes a novel living document framework for documenting and tracking the evolving design, assumptions, and societal effects of reinforcement learning systems after deployment.
Findings
Reward Reports effectively track system updates and feedback effects.
Application to Meta's BlenderBot 3 demonstrates practical utility.
Includes examples from various domains like gaming, recommendation, and traffic control.
Abstract
Building systems that are good for society in the face of complex societal effects requires a dynamic approach. Recent approaches to machine learning (ML) documentation have demonstrated the promise of discursive frameworks for deliberation about these complexities. However, these developments have been grounded in a static ML paradigm, leaving the role of feedback and post-deployment performance unexamined. Meanwhile, recent work in reinforcement learning has shown that the effects of feedback and optimization objectives on system behavior can be wide-ranging and unpredictable. In this paper we sketch a framework for documenting deployed and iteratively updated learning systems, which we call Reward Reports. Taking inspiration from various contributions to the technical literature on reinforcement learning, we outline Reward Reports as living documents that track updates to design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques
MethodsAverage Pooling · Prioritized Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Monte-Carlo Tree Search · Batch Normalization · Residual Connection · Convolution · Residual Block · MuZero
