Guided Online Distillation: Promoting Safe Reinforcement Learning by   Offline Demonstration

Jinning Li; Xinyi Liu; Banghua Zhu; Jiantao Jiao; Masayoshi Tomizuka,; Chen Tang; Wei Zhan

arXiv:2309.09408·cs.RO·October 16, 2023

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

Jinning Li, Xinyi Liu, Banghua Zhu, Jiantao Jiao, Masayoshi Tomizuka,, Chen Tang, Wei Zhan

PDF

Open Access

TL;DR

This paper introduces GOLD, a framework that distills offline expert policies into lightweight models to improve safe reinforcement learning, especially in safety-critical real-world tasks like autonomous driving.

Contribution

GOLD is a novel offline-to-online safe RL framework that effectively distills offline decision transformer policies into lightweight models for real-time safety-critical applications.

Findings

01

GOLD outperforms offline and online safe RL methods in benchmarks.

02

GOLD successfully applies to real-world autonomous driving scenarios.

03

Distilled policies meet real-time inference requirements.

Abstract

Safe Reinforcement Learning (RL) aims to find a policy that achieves high rewards while satisfying cost constraints. When learning from scratch, safe RL agents tend to be overly conservative, which impedes exploration and restrains the overall performance. In many realistic tasks, e.g. autonomous driving, large-scale expert demonstration data are available. We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue. Large-capacity models, e.g. decision transformers (DT), have been proven to be competent in offline policy learning. However, data collected in real-world scenarios rarely contain dangerous cases (e.g., collisions), which makes it prohibitive for the policies to learn safety concepts. Besides, these bulk policy networks cannot meet the computation speed requirements at inference time on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Occupational Health and Safety Research · Reinforcement Learning in Robotics

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings