NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows

Denis Tarasov; Alexander Nikulin; Ilya Zisman; Albina Klepach; Nikita Lyubaykin; Andrei Polubarov; Alexander Derevyagin; Vladislav Kurenkov

arXiv:2508.16845·cs.CV·October 15, 2025

NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows

Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Nikita Lyubaykin, Andrei Polubarov, Alexander Derevyagin, Vladislav Kurenkov

PDF

TL;DR

NinA introduces a Normalizing Flow-based decoder for Vision-Language-Action models, enabling faster, one-shot action sampling while maintaining performance comparable to diffusion models, thus improving real-time control capabilities.

Contribution

The paper proposes NinA, a novel Normalizing Flow-based decoder for VLA models, replacing diffusion models to enable rapid, one-shot action sampling without performance loss.

Findings

01

NinA matches diffusion models in performance on LIBERO benchmark.

02

NinA significantly reduces inference time for VLA models.

03

NinA enables high-frequency control suitable for real-world applications.

Abstract

Recent advances in Vision-Language-Action (VLA) models have established a two-component architecture, where a pre-trained Vision-Language Model (VLM) encodes visual observations and task descriptions, and an action decoder maps these representations to continuous actions. Diffusion models have been widely adopted as action decoders due to their ability to model complex, multimodal action distributions. However, they require multiple iterative denoising steps at inference time or downstream techniques to speed up sampling, limiting their practicality in real-world settings where high-frequency control is crucial. In this work, we present NinA (Normalizing Flows in Action), a fast and expressive alternative to diffusion-based decoders for VLAs. NinA replaces the diffusion action decoder with a Normalizing Flow (NF) that enables one-shot sampling through an invertible transformation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.