Loading paper
Web-Scale Multimodal Summarization using CLIP-Based Semantic Alignment | Tomesphere