Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance

Bram Silue; Santiago Amaya-Corredor; Patrick Mannion; Lander Willem; Pieter Libin

arXiv:2511.21356·cs.LG·April 23, 2026

Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance

Bram Silue, Santiago Amaya-Corredor, Patrick Mannion, Lander Willem, Pieter Libin

PDF

TL;DR

Hybrid-AIRL improves inverse reinforcement learning by integrating supervised expert guidance, leading to better reward inference and more stable, sample-efficient learning in complex, uncertain environments like poker.

Contribution

The paper introduces Hybrid-AIRL, a novel extension of AIRL that incorporates supervised expert data and stochastic regularization to enhance reward inference.

Findings

01

H-AIRL outperforms AIRL in sample efficiency and stability.

02

Incorporating supervised signals improves reward inference.

03

H-AIRL demonstrates effectiveness in complex environments like HULHE poker.

Abstract

Adversarial Inverse Reinforcement Learning (AIRL) has shown promise in addressing the sparse reward problem in reinforcement learning (RL) by inferring dense reward functions from expert demonstrations. However, its performance in highly complex, imperfect-information settings remains largely unexplored. To explore this gap, we evaluate AIRL in the context of Heads-Up Limit Hold'em (HULHE) poker, a domain characterized by sparse, delayed rewards and significant uncertainty. In this setting, we find that AIRL struggles to infer a sufficiently informative reward function. To overcome this limitation, we contribute Hybrid-AIRL (H-AIRL), an extension that enhances reward inference and policy learning by incorporating a supervised loss derived from expert data and a stochastic regularization mechanism. We evaluate H-AIRL on a carefully selected set of Gymnasium benchmarks and the HULHE poker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.