GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions
Ting-Yao Hsu, Chieh-Yang Huang, Ryan Rossi, Sungchul Kim, C. Lee Giles, and Ting-Hao K. Huang

TL;DR
This study demonstrates that GPT-4 can serve as an effective, cost-efficient zero-shot evaluator for scientific figure captions, outperforming other models and even some human evaluators in correlating with expert judgments.
Contribution
The paper introduces SCICAP-EVAL, a new dataset for evaluating figure caption quality and shows GPT-4's superior performance as a zero-shot evaluator for scientific figures.
Findings
GPT-4 outperforms other models in caption evaluation.
GPT-4's evaluations correlate well with expert judgments.
The dataset enables benchmarking of automatic caption evaluation methods.
Abstract
There is growing interest in systems that generate captions for scientific figures. However, assessing these systems output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluation depends on often low-quality author-written captions. This paper investigates using large language models (LLMs) as a cost-effective, reference-free method for evaluating figure captions. We first constructed SCICAP-EVAL, a human evaluation dataset that contains human judgments for 3,600 scientific figure captions, both original and machine-made, for 600 arXiv figures. We then prompted LLMs like GPT-4 and GPT-3 to score (1-6) each caption based on its potential to aid reader understanding, given relevant context such as figure-mentioning paragraphs. Results show that GPT-4, used as a zero-shot evaluator, outperformed all other models and even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Absolute Position Encodings · Label Smoothing · Position-Wise Feed-Forward Layer · Cosine Annealing · Byte Pair Encoding · Dropout · Weight Decay
