Loading paper
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models | Tomesphere