Loading paper
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention | Tomesphere