Loading paper
Parameter Efficient Multimodal Transformers for Video Representation Learning | Tomesphere