FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

Bin Yu; Shijie Lian; Xiaopeng Lin; Zhaolong Shen; Yuliang Wei; Changti Wu; Hang Yuan; Haishan Liu; Bailing Wang; Cong Huang; Kai Chen

arXiv:2605.13757·cs.RO·May 14, 2026

FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

Bin Yu, Shijie Lian, Xiaopeng Lin, Zhaolong Shen, Yuliang Wei, Changti Wu, Hang Yuan, Haishan Liu, Bailing Wang, Cong Huang, Kai Chen

PDF

1 Repo 2 Models

TL;DR

FrameSkip selectively samples high-importance frames from robot demonstration trajectories to improve vision-language-action policy training efficiency and success rates.

Contribution

Introduces a data-layer frame selection method that enhances training by focusing on critical frames without altering model architecture or training procedures.

Findings

01

Achieves a 76.15% success rate across benchmarks with only 20% of frames retained.

02

Outperforms full-frame training and simpler frame selection methods.

03

Improves success-retention trade-off in VLA policy training.

Abstract

Vision-Language-Action (VLA) policies are commonly trained from dense robot demonstration trajectories, often collected through teleoperation, by sampling every recorded frame as if it provided equally useful supervision. We argue that this convention creates a temporal supervision imbalance: long low-change segments dominate the training stream, while manipulation-critical transitions such as alignment, contact, grasping, and release appear only sparsely. We introduce FrameSkip, a data-layer frame selection framework that scores trajectory frames using action variation, visual-action coherence, task-progress priors, and gripper-transition preservation, then remaps training samples toward high-importance frames under a target retention ratio. Because FrameSkip operates only in the dataloader, it leaves the VLA architecture, action head, training objective, and inference procedure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zgc-embodyai/FrameSkip
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.