Loading paper
Large-scale representation learning from visually grounded untranscribed speech | Tomesphere