Loading paper
VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation | Tomesphere