TL;DR
Chart-FR1 introduces a focus-driven reasoning model that enhances fine-grained perception and adaptive reasoning in dense charts, outperforming existing models on a new challenging benchmark.
Contribution
The paper proposes a novel focus-driven reasoning framework with visual focusing chain-of-thought and reinforcement learning, addressing perception and reasoning challenges in high-density charts.
Findings
Outperforms state-of-the-art models on multiple chart benchmarks.
Effectively compresses redundant visual information for better focusing.
Achieves superior reasoning accuracy on the new HID-Chart benchmark.
Abstract
Multimodal large language models (MLLMs) have shown considerable potential in chart understanding and reasoning tasks. However, they still struggle with high information density (HID) charts characterized by multiple subplots, legends, and dense annotations due to three major challenges: (1) limited fine-grained perception results in the omission of critical visual cues; (2) redundant or noisy visual information undermines the performance of multimodal reasoning; (3) lack of adaptive deep reasoning relative to the amount of visual information. To tackle these challenges, we present a novel focus-driven fine-grained chart reasoning model, Chart-FR1, to improve perception, focusing efficiency, and adaptive deep reasoning on HID charts. Specifically, we propose Focus-CoT, a visual focusing chain-of-thought that enhances fine-grained perception by explicitly linking reasoning steps to key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
