GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks

Davide Buoso; Andrea Protopapa; Stefano Di Carlo; Francesca Pistilli; Giuseppe Averta

arXiv:2605.15836·cs.RO·May 18, 2026

GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks

Davide Buoso, Andrea Protopapa, Stefano Di Carlo, Francesca Pistilli, Giuseppe Averta

PDF

TL;DR

GAP introduces a pre-training method that regularizes visual representations to produce stable, geometry-aware keypoints, significantly improving data-efficient robotic manipulation learning under limited demonstrations.

Contribution

The paper proposes Geometric Anchor Pre-training (GAP), a lightweight, action-free pre-training stage that enhances geometric grounding in visual representations for manipulation tasks.

Findings

01

GAP outperforms fine-tuning and attention-based poolers in data-scarce scenarios.

02

Achieves 62% success on RoboMimic Can with 15 demonstrations.

03

Proxy pre-training is lightweight, decoupled, and reusable across tasks.

Abstract

Learning visuomotor policies from scarce expert demonstrations remains a core challenge in robotic manipulation. A primary hurdle lies in distilling high-dimensional RGB representations into control-relevant geometry without overfitting. While using frozen pre-trained Vision Foundation Models (VFMs) improves data efficiency, it also shifts most task adaptation onto a small spatial pooling module, which can latch onto task-irrelevant shortcuts and lose geometric grounding when finetuned with few data samples. More broadly, pre-trained visual representations used for policy learning have been observed to struggle under even minor scene perturbations, highlighting the need for robustness-oriented inductive biases. We propose Geometric Anchor Pre-training (GAP), a simple, action-free warm-up stage that regularizes the spatial adapter before downstream imitation learning. GAP pre-trains the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.