NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows
Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Nikita Lyubaykin, Andrei Polubarov, Alexander Derevyagin, Vladislav Kurenkov

TL;DR
NinA introduces a Normalizing Flow-based decoder for Vision-Language-Action models, enabling faster, one-shot action sampling while maintaining performance comparable to diffusion models, thus improving real-time control capabilities.
Contribution
The paper proposes NinA, a novel Normalizing Flow-based decoder for VLA models, replacing diffusion models to enable rapid, one-shot action sampling without performance loss.
Findings
NinA matches diffusion models in performance on LIBERO benchmark.
NinA significantly reduces inference time for VLA models.
NinA enables high-frequency control suitable for real-world applications.
Abstract
Recent advances in Vision-Language-Action (VLA) models have established a two-component architecture, where a pre-trained Vision-Language Model (VLM) encodes visual observations and task descriptions, and an action decoder maps these representations to continuous actions. Diffusion models have been widely adopted as action decoders due to their ability to model complex, multimodal action distributions. However, they require multiple iterative denoising steps at inference time or downstream techniques to speed up sampling, limiting their practicality in real-world settings where high-frequency control is crucial. In this work, we present NinA (Normalizing Flows in Action), a fast and expressive alternative to diffusion-based decoders for VLAs. NinA replaces the diffusion action decoder with a Normalizing Flow (NF) that enables one-shot sampling through an invertible transformation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
