Loading paper
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video | Tomesphere