Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

Kin-Ho Lam; Delyar Tabatabai; Jed Irvine; Donald Bertucci; Anita; Ruangrotsakun; Minsuk Kahng; Alan Fern

arXiv:2206.02039·cs.AI·June 9, 2022

Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

Kin-Ho Lam, Delyar Tabatabai, Jed Irvine, Donald Bertucci, Anita, Ruangrotsakun, Minsuk Kahng, Alan Fern

PDF

Open Access

TL;DR

This paper extends the CheckList testing methodology to planning-based reinforcement learning, enabling more comprehensive evaluation of an agent’s inference capabilities beyond standard value-based metrics.

Contribution

It introduces a CheckList approach for testing inference in planning-based RL agents, providing tools to identify reasoning flaws during tree search.

Findings

01

Effective in revealing previously unknown inference flaws

02

User study with AI researchers demonstrates practical utility

03

Provides insights into expert testing behaviors

Abstract

Reinforcement learning (RL) agents are commonly evaluated via their expected value over a distribution of test scenarios. Unfortunately, this evaluation approach provides limited evidence for post-deployment generalization beyond the test distribution. In this paper, we address this limitation by extending the recent CheckList testing methodology from natural language processing to planning-based RL. Specifically, we consider testing RL agents that make decisions via online tree search using a learned transition model and value function. The key idea is to improve the assessment of future performance via a CheckList approach for exploring and assessing the agent's inferences during tree search. The approach provides the user with an interface and general query-rule mechanism for identifying potential inference flaws and validating expected inference invariances. We present a user study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Reinforcement Learning in Robotics · Topic Modeling