From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models
Zhuofan Li, Hongkun Yang, Zhenyang Chen, Yangxuan Chen, Yingyan (Celine) Lin, Chaojian Li

TL;DR
This paper argues that traditional efficiency metrics for vision-language-action models do not reflect real-world embodied performance, proposing system-level embodied efficiency metrics for better evaluation.
Contribution
It introduces system-level embodied efficiency metrics and demonstrates their importance over conventional metrics in evaluating VLA models.
Findings
Conventional metrics can misrepresent real-world efficiency.
Embodied efficiency metrics reveal hidden performance differences.
Trade-offs exist between computational savings and motion quality.
Abstract
Vision-Language-Action (VLA) models have recently enabled embodied agents to perform increasingly complex tasks by jointly reasoning over visual, linguistic, and motor modalities. However, we find that the prevailing notion of ``efficiency'' in current VLA research, characterized by parameters, FLOPs, or token decoding throughput, does not reflect actual performance on robotic platforms. In real-world execution, efficiency is determined by system-level embodied behaviors such as task completion time, trajectory smoothness, cumulative joint rotation, and motion energy. Through controlled studies across model compression, token sparsification, and action sequence compression, we make several observations that challenge common assumptions. (1) Methods that reduce computation under conventional metrics often increase end-to-end execution cost or degrade motion quality, despite maintaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Reinforcement Learning in Robotics
