Loading paper
SG-VLA: Learning Spatially-Grounded Vision-Language-Action Models for Mobile Manipulation | Tomesphere