Loading paper
Audio-Visual Speaker Verification via Joint Cross-Attention | Tomesphere