Loading paper
Coarse-to-fine Alignment Makes Better Speech-image Retrieval | Tomesphere