Dual Alignment Maximin Optimization for Offline Model-based RL

Chi Zhou; Wang Luo; Haoran Li; Congying Han; Tiande Guo; Zicheng Zhang

arXiv:2502.00850·cs.LG·October 1, 2025

Dual Alignment Maximin Optimization for Offline Model-based RL

Chi Zhou, Wang Luo, Haoran Li, Congying Han, Tiande Guo, Zicheng Zhang

PDF

Open Access

TL;DR

This paper introduces DAMO, a novel offline RL framework that aligns policies with models and data, improving consistency and performance in distribution mismatch scenarios.

Contribution

The paper proposes a new actor-critic paradigm, Dual Alignment Maximin Optimization, focusing on policy discrepancies and ensuring model-policy consistency in offline RL.

Findings

01

DAMO achieves competitive results across benchmark tasks.

02

It effectively aligns policies with models and data.

03

The framework improves robustness against distribution mismatch.

Abstract

Offline reinforcement learning agents face significant deployment challenges due to the synthetic-to-real distribution mismatch. While most prior research has focused on improving the fidelity of synthetic sampling and incorporating off-policy mechanisms, the directly integrated paradigm often fails to ensure consistent policy behavior in biased models and underlying environmental dynamics, which inherently arise from discrepancies between behavior and learning policies. In this paper, we first shift the focus from model reliability to policy discrepancies while optimizing for expected returns, and then self-consistently incorporate synthetic data, deriving a novel actor-critic paradigm, Dual Alignment Maximin Optimization (DAMO). It is a unified framework to ensure both model-environment policy consistency and synthetic and offline data compatibility. The inner minimization performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReal-time simulation and control systems · Engineering Applied Research · Vehicle Dynamics and Control Systems

MethodsFocus