Loading paper
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Tomesphere