MMReview: A Multidisciplinary and Multimodal Benchmark for LLM-Based Peer Review Automation
Xian Gao, Jiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Ting Liu, Yuzhuo Fu

TL;DR
MMReview introduces a comprehensive, multimodal benchmark across multiple disciplines to evaluate LLMs' effectiveness in automating peer review, addressing current gaps in assessment standards.
Contribution
It presents the first unified benchmark with multimodal content and expert reviews for assessing LLMs in peer review tasks across diverse academic fields.
Findings
Open-source models show varying performance on the benchmark.
Advanced models outperform baseline models in review quality.
Benchmark reveals strengths and weaknesses of current LLMs in peer review tasks.
Abstract
With the rapid growth of academic publications, peer review has become an essential yet time-consuming responsibility within the research community. Large Language Models (LLMs) have increasingly been adopted to assist in the generation of review comments; however, current LLM-based review tasks lack a unified evaluation benchmark to rigorously assess the models' ability to produce comprehensive, accurate, and human-aligned assessments, particularly in scenarios involving multimodal content such as figures and tables. To address this gap, we propose \textbf{MMReview}, a comprehensive benchmark that spans multiple disciplines and modalities. MMReview includes multimodal content and expert-written review comments for 240 papers across 17 research domains within four major academic disciplines: Artificial Intelligence, Natural Sciences, Engineering Sciences, and Social Sciences. We design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
