Loading paper
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos | Tomesphere