Loading paper
Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models | Tomesphere