Loading paper
Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks? | Tomesphere