TGM-VLA: Task-Guided Mixup for Sampling-Efficient and Robust Robotic Manipulation
Fanqi Pu, Lei Jiang, Wenming Yang

TL;DR
This paper introduces TGM-VLA, a comprehensive framework for robotic manipulation that enhances data efficiency and robustness through improved sampling, a color inversion branch, and task-guided mixup, leading to state-of-the-art results.
Contribution
It presents a novel holistic approach combining optimized keyframe sampling, a color inversion projection, and task-guided mixup for better robotic learning performance.
Findings
Achieves 90.5% success on RLBench
Reduces memory use by 80%
Speeds up training by 5 times
Abstract
The performance of robotic imitation learning is fundamentally limited by data quality and training strategies. Prevalent sampling strategies on RLBench suffer from severe keyframe redundancy and imbalanced temporal distribution, leading to inefficient memory usage and unstable optimization. Moreover, reprojecting point clouds onto multi-view images with a black background--while more efficient than voxel-based methods--often causes dark objects to be indistinguishable and hard to manipulate. In this work, we propose a novel holistic framework that significantly improves both model performance and training efficiency. First, we redesign and optimize the keyframe sampling strategy, reducing memory consumption by 80% and accelerating training speed by 5x. Second, we augment the model with a color inversion projection branch--a simple yet effective module that resolves the ambiguity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Reinforcement Learning in Robotics
