Loading paper
PENDULUM: A Benchmark for Assessing Sycophancy in Multimodal Large Language Models | Tomesphere