CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning
Van-Quang Nguyen, Takayuki Okatani

TL;DR
CoReTab introduces a code-driven reasoning framework that enhances multimodal table understanding by providing scalable, interpretable, and verifiable reasoning annotations, leading to significant performance improvements across various benchmarks.
Contribution
It presents a novel framework that couples multi-step reasoning with executable Python code, creating a large verified dataset and improving model performance and interpretability.
Findings
Achieved +6.2%, +5.7%, and +25.6% improvements on key benchmarks.
Generated a dataset of 115K verified samples with detailed reasoning.
Produced models with transparent, verifiable reasoning traces.
Abstract
Existing datasets for multimodal table understanding, such as MMTab, primarily provide short factual answers without explicit multi-step reasoning supervision. Models trained on these datasets often generate brief responses that offers insufficient accuracy and limited interpretability into how these models arrive at the final answer. We introduce CoReTab, a code-driven reasoning framework that produces scalable, interpretable, and automatically verifiable annotations by coupling multi-step reasoning with executable Python code. Using the CoReTab framework, we curate a dataset of 115K verified samples averaging 529 tokens per response and fine-tune open-source MLLMs through a three-stage pipeline. We evaluate the resulting model trained on CoReTab across 17 MMTab benchmarks spanning table question answering, fact verification, and table structure understanding. Our model achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Handwritten Text Recognition Techniques · Machine Learning in Materials Science
