MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
Haochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong, Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang, Chun-Mei Feng, Yutong Xie, Imran, Razzak, Zongyuan Ge, Jionglong Su, Junjun He, Yu Qiao

TL;DR
This paper introduces MMRC, a comprehensive benchmark for evaluating multimodal large language models in real-world conversations, highlighting their limitations and proposing a note-taking strategy to improve performance.
Contribution
The paper presents MMRC, a large-scale real-world conversation benchmark, and proposes a note-taking method to enhance MLLMs' conversational abilities.
Findings
MLLMs show accuracy drops in real-world scenarios.
Identified failure patterns include memory degradation and error propagation.
Note-taking strategy improves model performance.
Abstract
Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, and reason in sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core open-ended abilities of MLLMs: information extraction, multi-turn reasoning, information update, image management, memory recall, and answer refusal. With data collected from real-world scenarios, MMRC comprises 5,120 conversations and 28,720 corresponding manually labeled questions, posing a significant challenge to existing MLLMs. Evaluations on 20 MLLMs in MMRC indicate an accuracy drop during open-ended interactions. We identify four common failure patterns: long-term memory degradation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
