Loading paper
Can Large Vision-Language Models Understand Multimodal Sarcasm? | Tomesphere