Loading paper
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers | Tomesphere