Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study
Yubai Wei, Chen Wu, and Hashem Haghbayan

TL;DR
This paper investigates whether explicit physical feasibility supervision can improve vision-language-action models for robot control, showing that it enhances reliability, efficiency, and task performance.
Contribution
The study introduces a geometry-grounded feasibility objective into VLA training and demonstrates its benefits through obstacle-aware manipulation experiments.
Findings
Feasibility supervision improves physical reliability of VLA policies.
Augmenting training with feasibility signals enhances task performance.
Explicit feasibility guidance accelerates learning in low-data scenarios.
Abstract
Vision-Language-Action (VLA) models map multimodal inputs directly to robot actions and are typically trained through large-scale imitation learning. While this paradigm has shown strong performance, prevailing VLA training procedures do not explicitly supervise hard physical constraints such as obstacle avoidance or kinematic feasibility. As a result, the geometric structure underlying physically feasible behavior must be inferred only implicitly from demonstrations. In this paper, we study whether introducing explicit feasibility supervision can provide effective structured guidance for VLA policies. We formulate a simple geometry-grounded feasibility objective and integrate it into the training stage of a diffusion-based VLA policy. To evaluate this idea systematically, we use obstacle-aware manipulation as a controlled probe of geometry-dependent physical feasibility. Empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
