Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization

Min Wang; Xin Li; Mingzhong Wang; Hasnaa Bennis

arXiv:2601.07164·cs.LG·January 13, 2026

Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization

Min Wang, Xin Li, Mingzhong Wang, Hasnaa Bennis

PDF

Open Access 1 Video

TL;DR

This paper introduces FLORA, a novel offline meta-reinforcement learning method that models feature distributions and uses adaptive correction to mitigate feature overgeneralization, improving task adaptation and policy performance.

Contribution

The paper proposes FLORA, which explicitly models feature distributions with invertible transformations and adaptively corrects feature overgeneralization in offline meta-RL.

Findings

01

FLORA reduces extrapolation errors caused by feature overgeneralization.

02

FLORA achieves faster adaptation and better policy performance in various environments.

03

The method effectively models complex task distributions for improved meta-learning.

Abstract

Offline meta-reinforcement learning (OMRL) combines the strengths of learning from diverse datasets in offline RL with the adaptability to new tasks of meta-RL, promising safe and efficient knowledge acquisition by RL agents. However, OMRL still suffers extrapolation errors due to out-of-distribution (OOD) actions, compromised by broad task distributions and Markov Decision Process (MDP) ambiguity in meta-RL setups. Existing research indicates that the generalization of the $Q$ network affects the extrapolation error in offline RL. This paper investigates this relationship by decomposing the $Q$ value into feature and weight components, observing that while decomposition enhances adaptability and convergence in the case of high-quality data, it often leads to policy degeneration or collapse in complex tasks. We observe that decomposed $Q$ values introduce a large estimation bias when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)