Off-policy Imitation Learning from Visual Inputs
Zhihao Cheng, Li Shen, Dacheng Tao

TL;DR
This paper introduces OPIfVI, an off-policy imitation learning method from visual inputs that improves data efficiency and feature extraction, achieving expert-level performance in visual imitation tasks.
Contribution
The paper presents a novel off-policy IL framework from visual inputs that combines data augmentation, spectral normalization, and specialized encoder training to enhance performance and data efficiency.
Findings
OPIfVI outperforms existing baselines in visual imitation tasks.
It achieves expert-level performance on DeepMind Control Suite.
The method improves data efficiency and feature extraction from visual inputs.
Abstract
Recently, various successful applications utilizing expert states in imitation learning (IL) have been witnessed. However, another IL setting -- IL from visual inputs (ILfVI), which has a greater promise to be applied in reality by utilizing online visual resources, suffers from low data-efficiency and poor performance resulted from an on-policy learning manner and high-dimensional visual inputs. We propose OPIfVI (Off-Policy Imitation from Visual Inputs), which is composed of an off-policy learning manner, data augmentation, and encoder techniques, to tackle the mentioned challenges, respectively. More specifically, to improve data-efficiency, OPIfVI conducts IL in an off-policy manner, with which sampled data can be used multiple times. In addition, we enhance the stability of OPIfVI with spectral normalization to mitigate the side-effect of off-policy training. The core factor,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsSpectral Normalization
