Loading paper
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter | Tomesphere