Dual Alignment Maximin Optimization for Offline Model-based RL
Chi Zhou, Wang Luo, Haoran Li, Congying Han, Tiande Guo, Zicheng Zhang

TL;DR
This paper introduces DAMO, a novel offline RL framework that aligns policies with models and data, improving consistency and performance in distribution mismatch scenarios.
Contribution
The paper proposes a new actor-critic paradigm, Dual Alignment Maximin Optimization, focusing on policy discrepancies and ensuring model-policy consistency in offline RL.
Findings
DAMO achieves competitive results across benchmark tasks.
It effectively aligns policies with models and data.
The framework improves robustness against distribution mismatch.
Abstract
Offline reinforcement learning agents face significant deployment challenges due to the synthetic-to-real distribution mismatch. While most prior research has focused on improving the fidelity of synthetic sampling and incorporating off-policy mechanisms, the directly integrated paradigm often fails to ensure consistent policy behavior in biased models and underlying environmental dynamics, which inherently arise from discrepancies between behavior and learning policies. In this paper, we first shift the focus from model reliability to policy discrepancies while optimizing for expected returns, and then self-consistently incorporate synthetic data, deriving a novel actor-critic paradigm, Dual Alignment Maximin Optimization (DAMO). It is a unified framework to ensure both model-environment policy consistency and synthetic and offline data compatibility. The inner minimization performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReal-time simulation and control systems · Engineering Applied Research · Vehicle Dynamics and Control Systems
MethodsFocus
