Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function
Ruijie Zheng, Xiyao Wang, Huazhe Xu, Furong Huang

TL;DR
This paper demonstrates that regularizing the Lipschitz continuity of the value function can replace the need for probabilistic model ensembles in model-based reinforcement learning, leading to more efficient algorithms.
Contribution
The paper introduces practical mechanisms to regularize the value function's Lipschitz condition, showing that a single model with this regularization can outperform ensemble methods.
Findings
Regularizing Lipschitz continuity reduces the gap between true and learned Bellman operators.
Single model with Lipschitz regularization outperforms ensemble models in experiments.
Theoretical analysis supports the effectiveness of Lipschitz regularization in model-based RL.
Abstract
Probabilistic dynamics model ensemble is widely used in existing model-based reinforcement learning methods as it outperforms a single dynamics model in both asymptotic performance and sample efficiency. In this paper, we provide both practical and theoretical insights on the empirical success of the probabilistic dynamics model ensemble through the lens of Lipschitz continuity. We find that, for a value function, the stronger the Lipschitz condition is, the smaller the gap between the true dynamics- and learned dynamics-induced Bellman operators is, thus enabling the converged value function to be closer to the optimal value function. Hence, we hypothesize that the key functionality of the probabilistic dynamics model ensemble is to regularize the Lipschitz condition of the value function using generated samples. To test this hypothesis, we devise two practical robust training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Smart Grid Security and Resilience
MethodsTest
