FLARE: Robot Learning with Implicit World Modeling

Ruijie Zheng; Jing Wang; Scott Reed; Johan Bjorck; Yu Fang; Fengyuan Hu; Joel Jang; Kaushil Kundalia; Zongyu Lin; Loic Magne; Avnish Narayan; You Liang Tan; Guanzhi Wang; Qi Wang; Jiannan Xiang; Yinzhen Xu; Seonghyeon Ye; Jan Kautz; Furong Huang; Yuke Zhu; Linxi Fan

arXiv:2505.15659·cs.RO·May 22, 2025

FLARE: Robot Learning with Implicit World Modeling

Ruijie Zheng, Jing Wang, Scott Reed, Johan Bjorck, Yu Fang, Fengyuan Hu, Joel Jang, Kaushil Kundalia, Zongyu Lin, Loic Magne, Avnish Narayan, You Liang Tan, Guanzhi Wang, Qi Wang, Jiannan Xiang, Yinzhen Xu, Seonghyeon Ye, Jan Kautz, Furong Huang, Yuke Zhu, Linxi Fan

PDF

Open Access

TL;DR

FLARE introduces a lightweight framework that integrates predictive latent world modeling into robot policy learning, enabling anticipation of future observations and improving multitask manipulation performance.

Contribution

The paper presents FLARE, a novel method that aligns diffusion transformer features with latent future observations, enhancing robot policy learning with minimal architectural changes.

Findings

01

Achieves state-of-the-art results on manipulation benchmarks.

02

Outperforms prior baselines by up to 26%.

03

Enables generalization with minimal human demonstrations.

Abstract

We introduce $F$ uture $LA$ tent $RE$ presentation Alignment ( $FLARE$ ), a novel framework that integrates predictive latent world modeling into robot policy learning. By aligning features from a diffusion transformer with latent embeddings of future observations, $FLARE$ enables a diffusion transformer policy to anticipate latent representations of future observations, allowing it to reason about long-term consequences while generating actions. Remarkably lightweight, $FLARE$ requires only minimal architectural modifications -- adding a few tokens to standard vision-language-action (VLA) models -- yet delivers substantial performance gains. Across two challenging multitask simulation imitation learning benchmarks spanning single-arm and humanoid tabletop manipulation, $FLARE$ achieves state-of-the-art performance, outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDiffusion