Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
Yongjae Shin, Jongseong Chae, Jongeui Park, Youngchul Sung

TL;DR
This paper introduces FINO, a novel offline-to-online reinforcement learning method that uses flow matching with injected noise and entropy-guided sampling to improve exploration and sample efficiency during fine-tuning.
Contribution
FINO is the first approach to integrate flow matching with noise injection and entropy-guided sampling for effective offline-to-online RL.
Findings
FINO outperforms existing methods with limited online interactions.
Injected noise enhances exploration beyond offline data.
Entropy-guided sampling balances exploration and exploitation.
Abstract
Generative models have recently demonstrated remarkable success across diverse domains, motivating their adoption as expressive policies in reinforcement learning (RL). While they have shown strong performance in offline RL, particularly where the target distribution is well defined, their extension to online fine-tuning has largely been treated as a direct continuation of offline pre-training, leaving key challenges unaddressed. In this paper, we propose Flow Matching with Injected Noise for Offline-to-Online RL (FINO), a novel method that leverages flow matching-based policies to enhance sample efficiency for offline-to-online RL. FINO facilitates effective exploration by injecting noise into policy training, thereby encouraging a broader range of actions beyond those observed in the offline dataset. In addition to exploration-enhanced flow policy training, we combine an…
Peer Reviews
Decision·ICLR 2026 Poster
1. FINO explicitly integrates exploration into the learning process and balances between exploration and exploitation. The empirical result demonstrates superior performance over a wide range of tasks. 2. The paper is overall well-written and clear.
1. Does the noise injection step damage the performance in the offline learning stage? 2. Does FINO reduce policy randomness during evaluation? 3. The benefit of injecting noise during training (Eq. 7) rather than directly adding Gaussian noise to the output action is unclear. 4. The role of entropy guidance in improving performance is also not well explained. Does it enhance policy learning, or does it serve as a technique to improve test-time behavior?
The paper exhibits a clear logical flow and high readability. The proposed algorithm is simple yet effective, supported by solid theoretical foundations and validated by striking experimental results.
Injecting noise helps enhance exploration capability, which is beneficial for the offline-to-online transition. However, why does this not compromise offline performance? After all, offline RL is inherently conservative and discourages exploration.
To my knowledge, this is the first paper to introduce the idea of noise injection into flow matching for offline-to-online RL, addressing a key limitation of prior work in offline-to-online RL. The empirical results demonstrate strong performance over the existing baselines (Tables 1 and 4), and attempts to justify the benefit of their method through ablations, though I provide comments about the ablations below.
The weaknesses I identify can be grouped broadly into three categories: exposition/justification of FINO’s objective, baselines and ablations, and smaller questions. I believe greater explanation and exposition of the noise injection for flow matching objective is needed. For example, why is this objective better than simply adding noise to the velocity target, $x_1 - x_0$? Additionally, is $\eta$ fixed throughout training? Do any of the baselines considered employ a similar policy extractio
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
