Loading paper
Rethinking Ground Truth: A Case Study on Human Label Variation in MLLM Benchmarking | Tomesphere