Loading paper
Enhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour Videos | Tomesphere