ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models

Wencheng Ye; Tianshi Wang; Lei Zhu; Fengling Li; Guoli Yang; Hengtao Shen

arXiv:2511.18082·cs.CV·April 14, 2026

ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models

Wencheng Ye, Tianshi Wang, Lei Zhu, Fengling Li, Guoli Yang, Hengtao Shen

PDF

TL;DR

ActDistill introduces an action-guided self-distillation framework that significantly reduces computation and latency in vision-language-action models while maintaining high performance, enabling more efficient robotic manipulation.

Contribution

It proposes a novel graph-structured distillation method guided by action priors, enabling lightweight VLA models with minimal performance loss.

Findings

01

Reduces computation by over 50%

02

Achieves up to 1.67 times speedup

03

Maintains or improves performance on benchmarks

Abstract

Recent Vision-Language-Action (VLA) models have shown impressive flexibility and generalization, yet their deployment in robotic manipulation remains limited by heavy computational overhead and inference latency. In this work, we present ActDistill, a general action-guided self-derived distillation framework that transfers the action prediction capability of any existing VLA model to a lightweight counterpart. Unlike previous efficiency strategies that primarily emphasize vision-language correlations, ActDistill leverages action priors to guide knowledge transfer and model compression, achieving action-oriented efficiency for VLA models. Specifically, we employ a well-trained VLA model as the teacher and introduce a graph-structured encapsulation strategy to explicitly model the hierarchical evolution of action prediction. The student model, derived from the graph-encapsulated teacher,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.