Learning Sequential Decisions from Multiple Sources via Group-Robust Markov Decision Processes

Mingyuan Xu; Zongqi Xia; Tianxi Cai; Doudou Zhou; Nian Si

arXiv:2602.01825·stat.ME·February 3, 2026

Learning Sequential Decisions from Multiple Sources via Group-Robust Markov Decision Processes

Mingyuan Xu, Zongqi Xia, Tianxi Cai, Doudou Zhou, Nian Si

PDF

Open Access

TL;DR

This paper develops a robust offline reinforcement learning framework for multi-site data with heterogeneity, introducing group-structured distributionally robust MDPs and an algorithm that leverages shared features and site clustering.

Contribution

It proposes a novel group-structured distributionally robust MDP model with tractable Bellman recursions and an offline algorithm incorporating site-specific ridge regression, worst-case aggregation, and clustering extensions.

Findings

01

Provides a suboptimality bound under partial coverage assumptions.

02

Introduces feature-wise uncertainty sets preserving tractability.

03

Demonstrates improved sample efficiency with site clustering.

Abstract

We often collect data from multiple sites (e.g., hospitals) that share common structure but also exhibit heterogeneity. This paper aims to learn robust sequential decision-making policies from such offline, multi-site datasets. To model cross-site uncertainty, we study distributionally robust MDPs with a group-linear structure: all sites share a common feature map, and both the transition kernels and expected reward functions are linear in these shared features. We introduce feature-wise (d-rectangular) uncertainty sets, which preserve tractable robust Bellman recursions while maintaining key cross-site structure. Building on this, we then develop an offline algorithm based on pessimistic value iteration that includes: (i) per-site ridge regression for Bellman targets, (ii) feature-wise worst-case (row-wise minimization) aggregation, and (iii) a data-dependent pessimism penalty computed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Causal Inference Techniques