Loading paper
Towards Visually Grounded Multimodal Summarization via Cross-Modal Transformer and Gated Attention | Tomesphere