MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based   Robot Navigation

Aaron M. Roth; Jing Liang; Ram Sriram; Elham Tabassi; and Dinesh; Manocha

arXiv:2209.09079·cs.RO·September 20, 2022

MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

Aaron M. Roth, Jing Liang, Ram Sriram, Elham Tabassi, and Dinesh, Manocha

PDF

Open Access

TL;DR

MSVIPER is a novel method that distills reinforcement learning policies into decision trees, enabling efficient, interpretable robot navigation with significant performance improvements in dynamic and complex environments.

Contribution

The paper introduces MSVIPER, a new policy distillation approach that produces compact decision trees from RL policies, with techniques for policy improvement without retraining.

Findings

01

Up to 95% reduction in freezing and oscillation behaviors.

02

Decision trees accurately mimic expert RL policies.

03

Enhanced outdoor navigation on complex terrains.

Abstract

We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy. Moreover, we present efficient policy distillation and tree-modification techniques that take advantage of the decision tree structure to allow improvements to a policy without retraining. We use our approach to improve the performance of RL-based robot navigation algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms of reduced freezing and oscillation behaviors (by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics