Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

Haoyun Liu; Jianzhuang Zhao; Xinyuan Chang; Tianle Shi; Chuanzhang Meng; Jiayuan Tan; Feng Xiong; Tong Lin; Dongjie Huo; Mu Xu; SongLin Dong; Zhiheng Ma; Yihong Gong; Sheng Zhong

arXiv:2603.01766·cs.RO·March 3, 2026

Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

Haoyun Liu, Jianzhuang Zhao, Xinyuan Chang, Tianle Shi, Chuanzhang Meng, Jiayuan Tan, Feng Xiong, Tong Lin, Dongjie Huo, Mu Xu, SongLin Dong, Zhiheng Ma, Yihong Gong, Sheng Zhong

PDF

Open Access

TL;DR

This paper introduces Neural Implicit Action Fields (NIAF), a novel approach that models actions as continuous functions, enabling more precise, differentiable, and physically plausible motion trajectories in vision-language-action systems.

Contribution

NIAF shifts action prediction from discrete waypoints to continuous functions, utilizing a hierarchical spectral modulator and learnable motion prior for high-resolution, differentiable trajectories.

Findings

01

Achieves state-of-the-art on CALVIN and LIBERO benchmarks.

02

Enables stable impedance control in real-world experiments.

03

Provides explicit supervision of velocity, acceleration, and jerk.

Abstract

Despite the rapid progress of Vision-Language-Action (VLA) models, the prevailing paradigm of predicting discrete waypoints remains fundamentally misaligned with the intrinsic continuity of physical motion. This discretization imposes rigid sampling rates, lacks high-order differentiability, and introduces quantization artifacts that hinder precise, compliant interaction. We propose Neural Implicit Action Fields (NIAF), a paradigm shift that reformulates action prediction from discrete waypoints to continuous action function regression. By utilizing an MLLM as a hierarchical spectral modulator over a learnable motion prior, NIAF synthesizes infinite-resolution trajectories as continuous-time manifolds. This formulation enables analytical differentiability, allowing for explicit supervision of velocity, acceleration, and jerk to ensure mathematical consistency and physical plausibility.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Human Pose and Action Recognition