Loading paper
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition | Tomesphere