Loading paper
MM-SeR: Multimodal Self-Refinement for Lightweight Image Captioning | Tomesphere