Loading paper
Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding | Tomesphere