Loading paper
PairBench: Are Vision-Language Models Reliable at Comparing What They See? | Tomesphere