Better Safe than Sorry: Evidence Accumulation Allows for Safe   Reinforcement Learning

Akshat Agarwal; Abhinau Kumar V; Kyle Dunovan; Erik Peterson; Timothy; Verstynen; Katia Sycara

arXiv:1809.09147·cs.LG·September 26, 2018

Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning

Akshat Agarwal, Abhinau Kumar V, Kyle Dunovan, Erik Peterson, Timothy, Verstynen, Katia Sycara

PDF

Open Access 1 Repo

TL;DR

This paper introduces an evidence accumulation module for reinforcement learning agents, enabling them to delay decisions until sufficiently confident, which improves safety and performance in uncertain, stochastic environments.

Contribution

It proposes a novel accumulator-based decision mechanism inspired by biological decision-making, allowing RL agents to act only when confident, reducing errors caused by premature actions.

Findings

01

Achieves near-optimal performance on a guessing game

02

Outperforms traditional deep recurrent networks

03

Enhances safety by delaying decisions until confidence is high

Abstract

In the real world, agents often have to operate in situations with incomplete information, limited sensing capabilities, and inherently stochastic environments, making individual observations incomplete and unreliable. Moreover, in many situations it is preferable to delay a decision rather than run the risk of making a bad decision. In such situations it is necessary to aggregate information before taking an action; however, most state of the art reinforcement learning (RL) algorithms are biased towards taking actions \textit{at every time step}, even if the agent is not particularly confident in its chosen action. This lack of caution can lead the agent to make critical mistakes, regardless of prior experience and acclimation to the environment. Motivated by theories of dynamic resolution of uncertainty during decision making in biological brains, we propose a simple accumulator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

susumuota/gym-modeestimation
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural dynamics and brain function · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)