Loading paper
VideoXum: Cross-modal Visual and Textural Summarization of Videos | Tomesphere