Loading paper
Multimodal Self-Attention Network with Temporal Alignment for Audio-Visual Emotion Recognition | Tomesphere