Loading paper
Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer | Tomesphere