Loading paper
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces | Tomesphere