Loading paper
Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval | Tomesphere