Advancing Chart Question Answering with Robust Chart Component Recognition
Hanwen Zheng, Sijia Wang, Chris Thomas, Lifu Huang

TL;DR
This paper introduces Chartformer, a unified framework for improved chart component recognition and question answering, utilizing a novel question-guided attention mechanism to enhance visual understanding and accuracy.
Contribution
The paper presents Chartformer, a new model that effectively recognizes chart components and integrates question guidance for better ChartQA performance.
Findings
Achieved 3.2% improvement in mAP for component recognition.
Achieved 15.4% improvement in accuracy for ChartQA.
Outperformed baseline models significantly in experiments.
Abstract
Chart comprehension presents significant challenges for machine learning models due to the diverse and intricate shapes of charts. Existing multimodal methods often overlook these visual features or fail to integrate them effectively for chart question answering (ChartQA). To address this, we introduce Chartformer, a unified framework that enhances chart component recognition by accurately identifying and classifying components such as bars, lines, pies, titles, legends, and axes. Additionally, we propose a novel Question-guided Deformable Co-Attention (QDCAt) mechanism, which fuses chart features encoded by Chartformer with the given question, leveraging the question's guidance to ground the correct answer. Extensive experiments demonstrate that the proposed approaches significantly outperform baseline models in chart component recognition and ChartQA tasks, achieving improvements of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques
