CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning

Qixian Huang; Hongqiang Lin; Tong Fu; Yingsen Wang; Zhenghui Fu; Qirui Wang; Yiding Sun; Dongxu Zhang

arXiv:2604.10973·cs.AI·April 14, 2026

CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning

Qixian Huang, Hongqiang Lin, Tong Fu, Yingsen Wang, Zhenghui Fu, Qirui Wang, Yiding Sun, Dongxu Zhang

PDF

TL;DR

CFMS introduces a hierarchical two-stage framework that combines multimodal perception and symbolic reasoning to improve tabular data understanding for question answering and fact verification.

Contribution

It proposes a novel coarse-to-fine paradigm that decouples visual perception from symbolic reasoning, enhancing robustness and efficiency in tabular reasoning tasks.

Findings

01

CFMS achieves competitive accuracy on WikiTQ and TabFact benchmarks.

02

The framework is robust with large tables and smaller models.

03

Extensive experiments validate its effectiveness and generalizability.

Abstract

Reasoning over tabular data is a crucial capability for tasks like question answering and fact verification, as it requires models to comprehend both free-form questions and semi-structured tables. However, while methods like Chain-of-Thought (CoT) introduce reasoning chains, purely symbolic methodes are inherently limited by their blindness to holistic visual patterns. To address this, we propose the Coarse-to-Fine Multimodal Synthesis framework (CFMS), a novel two-stage paradigm that hierarchically decouples high-level visual perception from granular symbolic reasoning. In the Coarse Stage, CFMS leverages the Multimodal Large Language Models (MLLMs) to perform a one-time synthesis of a multi-perspective knowledge tuple. This tuple subsequently serves as a dynamic reasoning map to guide the fine stage, where a symbolic engine executes a targeted and efficient sequence of iterative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.