BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning

Junsung Park

arXiv:2511.22210·cs.LG·December 1, 2025

BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning

Junsung Park

PDF

Open Access

TL;DR

BiCQL-ML introduces a bi-level offline inverse reinforcement learning framework that jointly optimizes reward and conservative Q-functions without explicit policy learning, improving reward recovery and policy performance.

Contribution

It presents a novel policy-free offline IRL method with theoretical guarantees and superior empirical results over existing baselines.

Findings

01

Achieves better reward recovery than existing IRL methods.

02

Improves downstream policy performance on standard benchmarks.

03

Converges to a reward function where the expert policy is soft-optimal.

Abstract

Offline inverse reinforcement learning (IRL) aims to recover a reward function that explains expert behavior using only fixed demonstration data, without any additional online interaction. We propose BiCQL-ML, a policy-free offline IRL algorithm that jointly optimizes a reward function and a conservative Q-function in a bi-level framework, thereby avoiding explicit policy learning. The method alternates between (i) learning a conservative Q-function via Conservative Q-Learning (CQL) under the current reward, and (ii) updating the reward parameters to maximize the expected Q-values of expert actions while suppressing over-generalization to out-of-distribution actions. This procedure can be viewed as maximum likelihood estimation under a soft value matching principle. We provide theoretical guarantees that BiCQL-ML converges to a reward function under which the expert policy is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Emotion and Mood Recognition