Loading paper
Do Vision--Language Models Understand 3D Scenes or Just Catalogue Objects? | Tomesphere