VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation
Xinglei Yu, Zhenyang Liu, Shufeng Nan, Simo Wu, Yanwei Fu

TL;DR
VADF introduces a vision-driven adaptive framework for diffusion policies in robotic manipulation, enhancing training efficiency and inference success by prioritizing difficult samples and adaptively segmenting tasks based on visual input.
Contribution
The paper presents a novel dual-adaptive framework with ALN for training and HVTS for inference, improving convergence speed and early success in robotic diffusion policies.
Findings
Reduces training convergence steps significantly.
Improves early inference success rate.
Decreases computational overhead during inference.
Abstract
Diffusion policies are becoming mainstream in robotic manipulation but suffer from hard negative class imbalance due to uniform sampling and lack of sample difficulty awareness, leading to slow training convergence and frequent inference timeout failures. We propose VADF (Vision-Adaptive Diffusion Policy Framework), a vision-driven dual-adaptive framework that significantly reduces convergence steps and achieves early success in inference, with model-agnostic design enabling seamless integration into any diffusion policy architecture. During training, we introduce Adaptive Loss Network (ALN), a lightweight MLP-based loss predictor that quantifies per-step sample difficulty in real time. Guided by hard negative mining, it performs weighted sampling to prioritize high-loss regions, enabling adaptive weight updates and faster convergence. In inference, we design the Hierarchical Vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
