Loading paper
SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval | Tomesphere