Loading paper
Visually grounded cross-lingual keyword spotting in speech | Tomesphere