Loading paper
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning | Tomesphere