Loading paper
VEGA: Visual Encoder Grounding Alignment for Spatially-Aware Vision-Language-Action Models | Tomesphere