Doubly Mild Generalization for Offline Reinforcement Learning
Yixiu Mao, Qi Wang, Yun Qu, Yuhang Jiang, Xiangyang Ji

TL;DR
This paper introduces Doubly Mild Generalization (DMG), a novel offline RL approach that balances trusting limited generalization to improve performance while controlling overestimation and propagation errors.
Contribution
The paper proposes DMG, a new method combining mild action generalization and propagation control, with theoretical guarantees and state-of-the-art empirical results.
Findings
DMG outperforms existing methods on Gym-MuJoCo and AntMaze tasks.
DMG guarantees better performance than in-sample optimal policies.
DMG seamlessly transitions from offline to online learning with strong fine-tuning results.
Abstract
Offline Reinforcement Learning (RL) suffers from the extrapolation error and value overestimation. From a generalization perspective, this issue can be attributed to the over-generalization of value functions or policies towards out-of-distribution (OOD) actions. Significant efforts have been devoted to mitigating such generalization, and recent in-sample learning approaches have further succeeded in entirely eschewing it. Nevertheless, we show that mild generalization beyond the dataset can be trusted and leveraged to improve performance under certain conditions. To appropriately exploit generalization in offline RL, we propose Doubly Mild Generalization (DMG), comprising (i) mild action generalization and (ii) mild generalization propagation. The former refers to selecting actions in a close neighborhood of the dataset to maximize the Q values. Even so, the potential erroneous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
