Reinforcement Learning Under Algorithmic Triage

Eleni Straitouri; Adish Singla; Vahid Balazadeh Meresht; Manuel; Gomez-Rodriguez

arXiv:2109.11328·cs.LG·September 24, 2021·1 cites

Reinforcement Learning Under Algorithmic Triage

Eleni Straitouri, Adish Singla, Vahid Balazadeh Meresht, Manuel, Gomez-Rodriguez

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning framework for algorithmic triage, combining offline and on-policy training to optimize machine-human collaboration, demonstrated through synthetic driving simulations.

Contribution

It develops a novel two-stage actor-critic method for reinforcement learning under triage, integrating offline human data with on-policy adjustments.

Findings

01

The two-stage method improves collaboration between machine and human policies.

02

Models trained with this approach outperform several baselines in synthetic driving tasks.

03

The approach effectively adapts to the impact of switching between human and machine decisions.

Abstract

Methods to learn under algorithmic triage have predominantly focused on supervised learning settings where each decision, or prediction, is independent of each other. Under algorithmic triage, a supervised learning model predicts a fraction of the instances and humans predict the remaining ones. In this work, we take a first step towards developing reinforcement learning models that are optimized to operate under algorithmic triage. To this end, we look at the problem through the framework of options and develop a two-stage actor-critic method to learn reinforcement learning models under triage. The first stage performs offline, off-policy training using human data gathered in an environment where the human has operated on their own. The second stage performs on-policy training to account for the impact that switching may have on the human policy, which may be difficult to anticipate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms