Loading paper
Transfer Learning from Audio-Visual Grounding to Speech Recognition | Tomesphere