Loading paper
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions | Tomesphere