Enhancing Scientific Figure Captioning Through Cross-modal Learning
Mateo Alejandro Rojas, Rafael Carranza

TL;DR
This paper introduces a novel cross-modal learning approach to automatically generate accurate and concise titles for scientific charts, improving data communication and retrieval in research contexts.
Contribution
It proposes a new method combining natural language processing and multimodal techniques for scientific chart captioning, addressing the challenge of diverse and complex research data.
Findings
Enhanced accuracy in chart title generation
Improved clarity and accessibility of research data
Effective integration of multimodal learning methods
Abstract
Scientific charts are essential tools for effectively communicating research findings, serving as a vital medium for conveying information and revealing data patterns. With the rapid advancement of science and technology, coupled with the advent of the big data era, the volume and diversity of scientific research data have surged, leading to an increase in the number and variety of charts. This trend presents new challenges for researchers, particularly in efficiently and accurately generating appropriate titles for these charts to better convey their information and results. Automatically generated chart titles can enhance information retrieval systems by providing precise data for detailed chart classification. As research in image captioning and text summarization matures, the automatic generation of scientific chart titles has gained significant attention. By leveraging natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Video Analysis and Summarization
