Loading paper
I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs | Tomesphere