Loading paper
Scene-VLM: Multimodal Video Scene Segmentation via Vision-Language Models | Tomesphere