Loading paper
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos | Tomesphere