Loading paper
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers | Tomesphere