Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung,, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji

TL;DR
This paper investigates factual inaccuracies in chart captioning by analyzing error patterns, creating a dataset, and proposing models for factual error detection and correction to improve reliability in visual data descriptions.
Contribution
It introduces a comprehensive error typology, a new dataset CHOCOLATE, and novel models for factual error detection and correction in chart captioning.
Findings
State-of-the-art models often produce factual errors in captions.
The proposed C2TFEC framework effectively corrects factual inaccuracies.
CHARTVE outperforms existing models in factual evaluation.
Abstract
Recent advancements in large vision-language models (LVLMs) have led to significant progress in generating natural language descriptions for visual content and thus enhancing various applications. One issue with these powerful models is that they sometimes produce texts that are factually inconsistent with the visual input. While there has been some effort to mitigate such inconsistencies in natural image captioning, the factuality of generated captions for structured document images, such as charts, has not received as much scrutiny, posing a potential threat to information reliability in critical applications. This work delves into the factuality aspect by introducing a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Natural Language Processing Techniques
