$\pi$, But Make It Fly: Physics-Guided Transfer of VLA Models to Aerial Manipulation
Johnathan Tucker, Denis Liu, Aiden Swann, Allen Ren, Javier Yu, Jiankai Sun, Brandon Kim, Lachlain McGranahan, Quan Vuong, Mac Schwager

TL;DR
This paper introduces AirVLA, a system that adapts vision-language-action models for aerial manipulation by using physics-guided guidance and synthetic data, achieving significant improvements in success rates for navigation and pick-and-place tasks.
Contribution
The paper presents a novel approach to transfer manipulation foundation models to aerial platforms using physics-informed guidance and synthetic data augmentation, without retraining the core model.
Findings
Synthetic data enables 100% success in navigation tasks.
Payload-Aware Guidance doubles pick-and-place success rate from 23% to 50%.
Achieves 62% success on long-horizon compositional tasks.
Abstract
Vision-Language-Action (VLA) models such as have demonstrated remarkable generalization across diverse fixed-base manipulators. However, transferring these foundation models to aerial platforms remains an open challenge due to the fundamental mismatch between the quasi-static dynamics of fixed-base arms and the underactuated, highly dynamic nature of flight. In this work, we introduce AirVLA, a system that investigates the transferability of manipulation-pretrained VLAs to aerial pick-and-place tasks. We find that while visual representations transfer effectively, the specific control dynamics required for flight do not. To bridge this "dynamics gap" without retraining the foundation model, we introduce a Payload-Aware Guidance mechanism that injects payload constraints directly into the policy's flow-matching sampling process. To overcome data scarcity, we further utilize a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Robotics and Sensor-Based Localization
