Loading paper
Contrastive Video-Language Learning with Fine-grained Frame Sampling | Tomesphere