SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers
Takuro Kawada, Shunsuke Kitada, Sota Nemoto, Hitoshi Iyatomi

TL;DR
This paper introduces SciGA-145k, a large dataset of scientific papers and figures, to support automated graphical abstract design and recommendation tasks, along with a new evaluation metric called CAR.
Contribution
The paper presents a new large-scale dataset and two novel recommendation tasks for graphical abstracts, along with a new metric for model evaluation in this domain.
Findings
Benchmark results validate the proposed tasks and metric.
The dataset supports research in automated GA generation.
CAR provides a more nuanced analysis of model behavior.
Abstract
Graphical Abstracts (GAs) play a crucial role in visually conveying the key findings of scientific papers. Although recent research increasingly incorporates visual materials such as Figure 1 as de facto GAs, their potential to enhance scientific communication remains largely unexplored. Designing effective GAs requires advanced visualization skills, hindering their widespread adoption. To tackle these challenges, we introduce SciGA-145k, a large-scale dataset comprising approximately 145,000 scientific papers and 1.14 million figures, specifically designed to support GA selection and recommendation, and to facilitate research in automated GA generation. As a preliminary step toward GA design support, we define two tasks: 1) Intra-GA Recommendation, identifying figures within a given paper well-suited as GAs, and 2) Inter-GA Recommendation, retrieving GAs from other papers to inspire…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
