NoiseGate: Learning Per-Latent Timestep Schedules as Information Gating in World Action Models
Wen Huang, Haoran Sun, Yongjian Guo, Yunxuan Ma, Haoran Li, Jing Long, Zhouying Mo, Zhong Guan, Yucheng Guo, Shuai Di, Junwu Xiong

TL;DR
NoiseGate introduces a learnable, per-latent timestep gating policy for world action models, improving action reliability by modulating latent noise levels during training and inference.
Contribution
It proposes a novel information-gating approach with a lightweight policy network, enhancing joint video-action modeling in world action models without hand-crafted priors.
Findings
NoiseGate achieves consistent performance gains on RoboTwin manipulation tasks.
Per-latent scheduling improves the reliability of predicted latent frames.
The method effectively trains the schedule policy through task-reward optimization.
Abstract
World Action Models (WAMs) are an emerging family of policies that tie robot action generation to future-observation modeling. In this work, we focus on the joint video--action modeling paradigm, where actions and imagined future observations are co-generated along a shared denoising or flow trajectory, so that perception, prediction, and control are coupled within one generative process. Existing WAMs typically realize this paradigm with a Mixture-of-Transformers (MoT), where video and action tokens interact through shared self-attention. This architecture can in principle assign a separate timestep to each predicted latent frame, yet current systems collapse this degree of freedom onto a single shared scalar . Under the noise-as-masking view of Diffusion Forcing, this shared schedule imposes the unjustified prior that every predicted latent is equally reliable for action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
