Loading paper
A Pre-trained Audio-Visual Transformer for Emotion Recognition | Tomesphere