PosterSum: A Multimodal Benchmark for Scientific Poster Summarization

Rohit Saxena; Pasquale Minervini; Frank Keller

arXiv:2502.17540·cs.CV·February 26, 2025

PosterSum: A Multimodal Benchmark for Scientific Poster Summarization

Rohit Saxena, Pasquale Minervini, Frank Keller

PDF

Open Access 1 Repo 1 Datasets

TL;DR

PosterSum introduces a new multimodal benchmark dataset of scientific posters with abstracts, highlighting challenges for vision-language models and proposing a hierarchical method that improves summarization performance.

Contribution

The paper presents PosterSum, a large multimodal dataset for scientific poster summarization, and proposes a hierarchical approach that outperforms existing models.

Findings

01

State-of-the-art MLLMs struggle with poster understanding

02

Hierarchical method improves ROUGE-L by 3.14%

03

PosterSum serves as a new benchmark for future research

Abstract

Generating accurate and concise textual summaries from multimodal documents is challenging, especially when dealing with visually complex content like scientific posters. We introduce PosterSum, a novel benchmark to advance the development of vision-language models that can understand and summarize scientific posters into research paper abstracts. Our dataset contains 16,305 conference posters paired with their corresponding abstracts as summaries. Each poster is provided in image format and presents diverse visual understanding challenges, such as complex layouts, dense text regions, tables, and figures. We benchmark state-of-the-art Multimodal Large Language Models (MLLMs) on PosterSum and demonstrate that they struggle to accurately interpret and summarize scientific posters. We propose Segment & Summarize, a hierarchical method that outperforms current MLLMs on automated metrics,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

saxenarohit/postersum
noneOfficial

Datasets

rohitsaxena/PosterSum
dataset· 26 dl
26 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques