Enhancing Scientific Figure Captioning Through Cross-modal Learning

Mateo Alejandro Rojas; Rafael Carranza

arXiv:2406.17047·cs.CV·June 26, 2024

Enhancing Scientific Figure Captioning Through Cross-modal Learning

Mateo Alejandro Rojas, Rafael Carranza

PDF

Open Access

TL;DR

This paper introduces a novel cross-modal learning approach to automatically generate accurate and concise titles for scientific charts, improving data communication and retrieval in research contexts.

Contribution

It proposes a new method combining natural language processing and multimodal techniques for scientific chart captioning, addressing the challenge of diverse and complex research data.

Findings

01

Enhanced accuracy in chart title generation

02

Improved clarity and accessibility of research data

03

Effective integration of multimodal learning methods

Abstract

Scientific charts are essential tools for effectively communicating research findings, serving as a vital medium for conveying information and revealing data patterns. With the rapid advancement of science and technology, coupled with the advent of the big data era, the volume and diversity of scientific research data have surged, leading to an increase in the number and variety of charts. This trend presents new challenges for researchers, particularly in efficiently and accurately generating appropriate titles for these charts to better convey their information and results. Automatically generated chart titles can enhance information retrieval systems by providing precise data for detailed chart classification. As research in image captioning and text summarization matures, the automatic generation of scientific chart titles has gained significant attention. By leveraging natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Video Analysis and Summarization