Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for   Test-Time Policy Adaptation

Andi Peng; Aviv Netanyahu; Mark Ho; Tianmin Shu; Andreea Bobu; Julie; Shah; Pulkit Agrawal

arXiv:2307.06333·cs.LG·July 20, 2023·2 cites

Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation

Andi Peng, Aviv Netanyahu, Mark Ho, Tianmin Shu, Andreea Bobu, Julie, Shah, Pulkit Agrawal

PDF

Open Access 1 Video

TL;DR

This paper introduces a human-in-the-loop framework that uses user feedback and counterfactual demonstrations to identify task-irrelevant concepts, enabling personalized policy adaptation and improved robustness in control tasks.

Contribution

It presents a novel interactive approach combining feedback and counterfactuals for personalized task-irrelevant concept identification and policy adaptation.

Findings

01

Reduces demonstrations needed for fine-tuning

02

Improves understanding of agent failure modes

03

Aligns policies with individual user preferences

Abstract

Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences about how the task is performed. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts. Our key idea is to generate counterfactual demonstrations that allow users to quickly identify possible task-relevant and irrelevant concepts. The knowledge of task-irrelevant concepts is then used to perform data augmentation and thus obtain a policy adapted to personalized user objectives. We present experiments validating our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation· slideslive

Taxonomy

TopicsHuman-Automation Interaction and Safety · Explainable Artificial Intelligence (XAI) · Context-Aware Activity Recognition Systems

Methodsfail