Loading paper
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes | Tomesphere