Loading paper
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data | Tomesphere