On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration

Yirui Zhou; Yunfei Jin; Xiaowei Liu; Xiaofeng Zhang; Yangchun Zhang

arXiv:2501.12785·stat.ML·October 22, 2025

On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration

Yirui Zhou, Yunfei Jin, Xiaowei Liu, Xiaofeng Zhang, Yangchun Zhang

PDF

Open Access

TL;DR

This paper introduces MODULE, a novel algorithm that enhances learning from observations by combining distributional RL and soft actor-critic, achieving high sample efficiency and stability in imitation tasks without expert actions.

Contribution

It provides a theoretical analysis of generalization in LfO and proposes MODULE, a new method integrating distributional RL with SAC for more stable and efficient imitation learning from observations.

Findings

01

MODULE outperforms existing LfO methods in MuJoCo environments.

02

Theoretical insights into reward and policy generalization in LfO.

03

Enhanced training stability and sample efficiency achieved by MODULE.

Abstract

Learning from observations (LfO) replicates expert behavior without needing access to the expert's actions, making it more practical than learning from demonstrations (LfD) in many real-world scenarios. However, directly applying the on-policy training scheme in LfO worsens the sample inefficiency problem, while employing the traditional off-policy training scheme in LfO magnifies the instability issue. This paper seeks to develop an efficient and stable solution for the LfO problem. Specifically, we begin by exploring the generalization capabilities of both the reward function and policy in LfO, which provides a theoretical foundation for computation. Building on this, we modify the policy optimization method in generative adversarial imitation from observation (GAIfO) with distributional soft actor-critic (DSAC), and propose the Mimicking Observations through Distributional Update…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTarget Tracking and Data Fusion in Sensor Networks · Medical Image Segmentation Techniques · Advanced Data Compression Techniques