How Generalizable Is My Behavior Cloning Policy? A Statistical Approach   to Trustworthy Performance Evaluation

Joseph A. Vincent; Haruki Nishimura; Masha Itkina; Paarth Shah; Mac; Schwager; Thomas Kollar

arXiv:2405.05439·cs.RO·October 24, 2024·1 cites

How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

Joseph A. Vincent, Haruki Nishimura, Masha Itkina, Paarth Shah, Mac, Schwager, Thomas Kollar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a statistical framework to evaluate the performance and generalization of behavior cloning policies with minimal experiments, providing reliable bounds even under distribution shifts.

Contribution

It proposes a method to compute tight, confidence-based performance bounds for robot policies using minimal rollouts, applicable in simulation and real-world settings.

Findings

01

Validated bounds in simulated manipulation tasks

02

Assessed policy generalization to new environments

03

Compared policies in out-of-distribution scenarios

Abstract

With the rise of stochastic generative models in robot policy learning, end-to-end visuomotor policies are increasingly successful at solving complex tasks by learning from human demonstrations. Nevertheless, since real-world evaluation costs afford users only a small number of policy rollouts, it remains a challenge to accurately gauge the performance of such policies. This is exacerbated by distribution shifts causing unpredictable changes in performance during deployment. To rigorously evaluate behavior cloning policies, we present a framework that provides a tight lower-bound on robot performance in an arbitrary environment, using a minimal number of experimental policy rollouts. Notably, by applying the standard stochastic ordering to robot performance distributions, we provide a worst-case bound on the entire distribution of performance (via bounds on the cumulative distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tri-ml/stochastic_verification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBehavioral Health and Interventions