Diverse Policies Recovering via Pointwise Mutual Information Weighted   Imitation Learning

Hanlin Yang; Jian Yao; Weiming Liu; Qing Wang; Hanmin Qin; Hansheng; Kong; Kirk Tang; Jiechao Xiong; Chao Yu; Kai Li; Junliang Xing; Hongwu Chen,; Juchao Zhuo; Qiang Fu; Yang Wei; Haobo Fu

arXiv:2410.15910·cs.LG·October 23, 2024

Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning

Hanlin Yang, Jian Yao, Weiming Liu, Qing Wang, Hanmin Qin, Hansheng, Kong, Kirk Tang, Jiechao Xiong, Chao Yu, Kai Li, Junliang Xing, Hongwu Chen,, Juchao Zhuo, Qiang Fu, Yang Wei, Haobo Fu

PDF

Open Access

TL;DR

This paper introduces a novel approach in imitation learning that weights state-action pairs by pointwise mutual information to better recover diverse policies from expert trajectories, emphasizing style-relevant data.

Contribution

It proposes a new weighted behavioral cloning method using pointwise mutual information to improve diversity and accuracy in policy recovery.

Findings

01

Enhanced policy recovery accuracy demonstrated in experiments.

02

Effective identification of style-relevant state-action pairs.

03

Theoretical justification supports the proposed weighting mechanism.

Abstract

Recovering a spectrum of diverse policies from a set of expert trajectories is an important research topic in imitation learning. After determining a latent style for a trajectory, previous diverse policies recovering methods usually employ a vanilla behavioral cloning learning objective conditioned on the latent style, treating each state-action pair in the trajectory with equal importance. Based on an observation that in many scenarios, behavioral styles are often highly relevant with only a subset of state-action pairs, this paper presents a new principled method in diverse polices recovery. In particular, after inferring or assigning a latent style for a trajectory, we enhance the vanilla behavioral cloning by incorporating a weighting mechanism based on pointwise mutual information. This additional weighting reflects the significance of each state-action pair's contribution to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law

MethodsFocus · Sparse Evolutionary Training