Learning Optimal and Sample-Efficient Decision Policies with Guarantees

Daqian Shao

arXiv:2602.17978·cs.LG·February 23, 2026

Learning Optimal and Sample-Efficient Decision Policies with Guarantees

Daqian Shao

PDF

Open Access

TL;DR

This paper introduces a sample-efficient, guaranteed method for learning decision policies from offline data with hidden confounders, applicable to high-stakes domains like healthcare and finance.

Contribution

It develops a novel algorithm based on instrumental variables and CMR to address hidden confounders, improving sample efficiency and providing convergence guarantees.

Findings

01

Outperforms state-of-the-art algorithms in sample efficiency

02

Successfully learns effective policies from offline datasets with confounders

03

Demonstrates applicability to real-world decision-making benchmarks

Abstract

The paradigm of decision-making has been revolutionised by reinforcement learning and deep learning. Although this has led to significant progress in domains such as robotics, healthcare, and finance, the use of RL in practice is challenging, particularly when learning decision policies in high-stakes applications that may require guarantees. Traditional RL algorithms rely on a large number of online interactions with the environment, which is problematic in scenarios where online interactions are costly, dangerous, or infeasible. However, learning from offline datasets is hindered by the presence of hidden confounders. Such confounders can cause spurious correlations in the dataset and can mislead the agent into taking suboptimal or adversarial actions. Firstly, we address the problem of learning from offline datasets in the presence of hidden confounders. We work with instrumental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research