Loading paper
From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation from 2D VLMs | Tomesphere