Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models
Haoyun Liu, Jianzhuang Zhao, Xinyuan Chang, Tianle Shi, Chuanzhang Meng, Jiayuan Tan, Feng Xiong, Tong Lin, Dongjie Huo, Mu Xu, SongLin Dong, Zhiheng Ma, Yihong Gong, Sheng Zhong

TL;DR
This paper introduces Neural Implicit Action Fields (NIAF), a novel approach that models actions as continuous functions, enabling more precise, differentiable, and physically plausible motion trajectories in vision-language-action systems.
Contribution
NIAF shifts action prediction from discrete waypoints to continuous functions, utilizing a hierarchical spectral modulator and learnable motion prior for high-resolution, differentiable trajectories.
Findings
Achieves state-of-the-art on CALVIN and LIBERO benchmarks.
Enables stable impedance control in real-world experiments.
Provides explicit supervision of velocity, acceleration, and jerk.
Abstract
Despite the rapid progress of Vision-Language-Action (VLA) models, the prevailing paradigm of predicting discrete waypoints remains fundamentally misaligned with the intrinsic continuity of physical motion. This discretization imposes rigid sampling rates, lacks high-order differentiability, and introduces quantization artifacts that hinder precise, compliant interaction. We propose Neural Implicit Action Fields (NIAF), a paradigm shift that reformulates action prediction from discrete waypoints to continuous action function regression. By utilizing an MLLM as a hierarchical spectral modulator over a learnable motion prior, NIAF synthesizes infinite-resolution trajectories as continuous-time manifolds. This formulation enables analytical differentiability, allowing for explicit supervision of velocity, acceleration, and jerk to ensure mathematical consistency and physical plausibility.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Human Pose and Action Recognition
