SciCap+: A Knowledge Augmented Dataset to Study the Challenges of   Scientific Figure Captioning

Zhishen Yang; Raj Dabre; Hideki Tanaka; Naoaki Okazaki

arXiv:2306.03491·cs.CV·June 7, 2023·2 cites

SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

Zhishen Yang, Raj Dabre, Hideki Tanaka, Naoaki Okazaki

PDF

Open Access 1 Repo

TL;DR

This paper introduces SciCap+, an extended dataset and a knowledge-augmented approach for scientific figure captioning, demonstrating that additional context improves caption quality, with implications for automating scientific communication.

Contribution

The paper presents SciCap+, an extended dataset with mention-paragraphs and OCR tokens, and evaluates a multimodal transformer model showing improved captioning performance with added context.

Findings

01

Mention-paragraphs significantly improve captioning scores.

02

Human evaluation highlights challenges in generating informative captions.

03

Knowledge-augmented models outperform figure-only baselines.

Abstract

In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings. Unlike previous studies, we reframe scientific figure captioning as a knowledge-augmented image captioning task that models need to utilize knowledge embedded across modalities for caption generation. To this end, we extended the large-scale SciCap dataset~\cite{hsu-etal-2021-scicap-generating} to SciCap+ which includes mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Then, we conduct experiments with the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention-paragraphs serves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhishenyang/scientific_figure_captioning_dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling