Reward Reports for Reinforcement Learning

Thomas Krendl Gilbert; Nathan Lambert; Sarah Dean; Tom Zick; Aaron; Snoswell

arXiv:2204.10817·cs.LG·March 21, 2023·5 cites

Reward Reports for Reinforcement Learning

Thomas Krendl Gilbert, Nathan Lambert, Sarah Dean, Tom Zick, Aaron, Snoswell

PDF

Open Access 1 Repo

TL;DR

This paper introduces Reward Reports, a dynamic documentation framework for reinforcement learning systems that tracks ongoing updates, feedback effects, and societal impacts post-deployment, inspired by ML documentation practices.

Contribution

It proposes a novel living document framework for documenting and tracking the evolving design, assumptions, and societal effects of reinforcement learning systems after deployment.

Findings

01

Reward Reports effectively track system updates and feedback effects.

02

Application to Meta's BlenderBot 3 demonstrates practical utility.

03

Includes examples from various domains like gaming, recommendation, and traffic control.

Abstract

Building systems that are good for society in the face of complex societal effects requires a dynamic approach. Recent approaches to machine learning (ML) documentation have demonstrated the promise of discursive frameworks for deliberation about these complexities. However, these developments have been grounded in a static ML paradigm, leaving the role of feedback and post-deployment performance unexamined. Meanwhile, recent work in reinforcement learning has shown that the effects of feedback and optimization objectives on system behavior can be wide-ranging and unpredictable. In this paper we sketch a framework for documenting deployed and iteratively updated learning systems, which we call Reward Reports. Taking inspiration from various contributions to the technical literature on reinforcement learning, we outline Reward Reports as living documents that track updates to design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rewardreports/reward-reports
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques

MethodsAverage Pooling · Prioritized Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Monte-Carlo Tree Search · Batch Normalization · Residual Connection · Convolution · Residual Block · MuZero