Loading paper
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT | Tomesphere