Loading paper
Learning Spatiotemporal Features via Video and Text Pair Discrimination | Tomesphere