Loading paper
Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP | Tomesphere