TL;DR
This paper introduces intention-conditioned flow occupancy models (InFOM), a probabilistic approach for pre-training RL agents that predicts future states considering user intentions, improving performance on benchmark tasks.
Contribution
The paper presents a novel intention-conditioned flow model for RL pre-training, enhancing expressivity and adaptation through a latent intention variable.
Findings
Achieves 1.8x median improvement in returns.
Increases success rates by 36%.
Demonstrates effectiveness on 36 state-based and 4 image-based benchmarks.
Abstract
Large-scale pre-training has fundamentally changed how machine learning research is done today: large foundation models are trained once, and then can be used by anyone in the community (including those without data or compute resources to train a model from scratch) to adapt and fine-tune to specific tasks. Applying this same framework to reinforcement learning (RL) is appealing because it offers compelling avenues for addressing core challenges in RL, including sample efficiency and robustness. However, there remains a fundamental challenge to pre-train large models in the context of RL: actions have long-term dependencies, so training a foundation model that reasons across time is important. Recent advances in generative AI have provided new tools for modeling highly complex distributions. In this paper, we build a probabilistic model to predict which states an agent will visit in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
