Imitating from auxiliary imperfect demonstrations via Adversarial   Density Weighted Regression

Ziqi Zhang; Zifeng Zhuang; Jingzehua Xu; Yiyuan Yang; Yubo Huang,; Donglin Wang; Shuai Zhang

arXiv:2405.20351·cs.LG·January 14, 2025

Imitating from auxiliary imperfect demonstrations via Adversarial Density Weighted Regression

Ziqi Zhang, Zifeng Zhuang, Jingzehua Xu, Yiyuan Yang, Yubo Huang,, Donglin Wang, Shuai Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Adversarial Density Regression (ADR), a novel one-step supervised imitation learning framework that effectively corrects policies trained on imperfect demonstrations to match expert distributions without relying on the Bellman operator.

Contribution

The paper presents ADR, a new IL method combining density-weighted behavioral cloning with theoretical guarantees, addressing limitations of previous algorithms like OOD issues and reliance on multi-step updates.

Findings

01

ADR outperforms existing IL algorithms on Gym-Mujoco tasks.

02

ADR achieves 89.5% improvement over IQL with ground truth rewards on Adroit and Kitchen tasks.

03

Theoretical analysis shows ADR effectively aligns policy distribution with expert distribution.

Abstract

We propose a novel one-step supervised imitation learning (IL) framework called Adversarial Density Regression (ADR). This IL framework aims to correct the policy learned on unknown-quality to match the expert distribution by utilizing demonstrations, without relying on the Bellman operator. Specifically, ADR addresses several limitations in previous IL algorithms: First, most IL algorithms are based on the Bellman operator, which inevitably suffer from cumulative offsets from sub-optimal rewards during multi-step update processes. Additionally, off-policy training frameworks suffer from Out-of-Distribution (OOD) state-actions. Second, while conservative terms help solve the OOD issue, balancing the conservative term is difficult. To address these limitations, we fully integrate a one-step density-weighted Behavioral Cloning (BC) objective for IL with auxiliary imperfect demonstration.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stevezhangza/adverserial_density_regression
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Fault Detection and Control Systems

MethodsALIGN · Implicit Q-Learning