GIPO: Gaussian Importance Sampling Policy Optimization
Chengxuan Lu, Zhenquan Zhang, Shukuan Wang, Qunzhi Lin, Baigui Sun, Yang Liu

TL;DR
GIPO introduces a Gaussian importance sampling method for reinforcement learning policy optimization, enhancing data efficiency, stability, and robustness especially with limited or outdated data, outperforming existing clipping-based approaches.
Contribution
It proposes a novel Gaussian importance sampling objective that improves data efficiency and stability in RL policy optimization, with theoretical guarantees and superior empirical performance.
Findings
Achieves state-of-the-art results among clipping-based methods.
Exhibits superior bias--variance trade-off and training stability.
Improves sample efficiency across various replay buffer sizes.
Abstract
Post-training with reinforcement learning (RL) has recently shown strong promise for advancing multimodal agents beyond supervised imitation. However, RL remains limited by poor data efficiency, particularly in settings where interaction data are scarce and quickly become outdated. To address this challenge, GIPO (Gaussian Importance sampling Policy Optimization) is proposed as a policy optimization objective based on truncated importance sampling, replacing hard clipping with a log-ratio-based Gaussian trust weight to softly damp extreme importance ratios while maintaining non-zero gradients. Theoretical analysis shows that GIPO introduces an implicit, tunable constraint on the update magnitude, while concentration bounds guarantee robustness and stability under finite-sample estimation. Experimental results show that GIPO achieves state-of-the-art performance among clipping-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
