Loading paper
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions | Tomesphere