On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration
Yirui Zhou, Yunfei Jin, Xiaowei Liu, Xiaofeng Zhang, Yangchun Zhang

TL;DR
This paper introduces MODULE, a novel algorithm that enhances learning from observations by combining distributional RL and soft actor-critic, achieving high sample efficiency and stability in imitation tasks without expert actions.
Contribution
It provides a theoretical analysis of generalization in LfO and proposes MODULE, a new method integrating distributional RL with SAC for more stable and efficient imitation learning from observations.
Findings
MODULE outperforms existing LfO methods in MuJoCo environments.
Theoretical insights into reward and policy generalization in LfO.
Enhanced training stability and sample efficiency achieved by MODULE.
Abstract
Learning from observations (LfO) replicates expert behavior without needing access to the expert's actions, making it more practical than learning from demonstrations (LfD) in many real-world scenarios. However, directly applying the on-policy training scheme in LfO worsens the sample inefficiency problem, while employing the traditional off-policy training scheme in LfO magnifies the instability issue. This paper seeks to develop an efficient and stable solution for the LfO problem. Specifically, we begin by exploring the generalization capabilities of both the reward function and policy in LfO, which provides a theoretical foundation for computation. Building on this, we modify the policy optimization method in generative adversarial imitation from observation (GAIfO) with distributional soft actor-critic (DSAC), and propose the Mimicking Observations through Distributional Update…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTarget Tracking and Data Fusion in Sensor Networks · Medical Image Segmentation Techniques · Advanced Data Compression Techniques
