FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models

Kewei Chen; Yayu Long; Shuai Li; Mingsheng Shang

arXiv:2511.16233·cs.RO·November 21, 2025

FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models

Kewei Chen, Yayu Long, Shuai Li, Mingsheng Shang

PDF

Open Access

TL;DR

This paper presents FT-NCFM, a data-centric framework that distills valuable training data for VLA models, significantly reducing training time while maintaining high performance.

Contribution

It introduces a novel data distillation framework using a self-contained Fact-Tracing engine to generate a model-agnostic, information-rich data subset for VLA models.

Findings

01

Models trained on 5% of distilled data achieve 85-90% success rate.

02

Training time is reduced by over 80%.

03

The framework outperforms traditional data and model compression methods.

Abstract

The powerful generalization of Vision-Language-Action (VLA) models is bottlenecked by their heavy reliance on massive, redundant, and unevenly valued datasets, hindering their widespread application. Existing model-centric optimization paths, such as model compression (which often leads to performance degradation) or policy distillation (whose products are model-dependent and lack generality), fail to fundamentally address this data-level challenge. To this end, this paper introduces FT-NCFM, a fundamentally different, data-centric generative data distillation framework. Our framework employs a self-contained Fact-Tracing (FT) engine that combines causal attribution with programmatic contrastive verification to assess the intrinsic value of samples. Guided by these assessments, an adversarial NCFM process synthesizes a model-agnostic, information-dense, and reusable data asset.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling