Loading paper
Chain-of-Caption: Training-free improvement of multimodal large language model on referring expression comprehension | Tomesphere