CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning

Van-Quang Nguyen; Takayuki Okatani

arXiv:2601.19193·cs.AI·January 28, 2026

CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning

Van-Quang Nguyen, Takayuki Okatani

PDF

Open Access 1 Video

TL;DR

CoReTab introduces a code-driven reasoning framework that enhances multimodal table understanding by providing scalable, interpretable, and verifiable reasoning annotations, leading to significant performance improvements across various benchmarks.

Contribution

It presents a novel framework that couples multi-step reasoning with executable Python code, creating a large verified dataset and improving model performance and interpretability.

Findings

01

Achieved +6.2%, +5.7%, and +25.6% improvements on key benchmarks.

02

Generated a dataset of 115K verified samples with detailed reasoning.

03

Produced models with transparent, verifiable reasoning traces.

Abstract

Existing datasets for multimodal table understanding, such as MMTab, primarily provide short factual answers without explicit multi-step reasoning supervision. Models trained on these datasets often generate brief responses that offers insufficient accuracy and limited interpretability into how these models arrive at the final answer. We introduce CoReTab, a code-driven reasoning framework that produces scalable, interpretable, and automatically verifiable annotations by coupling multi-step reasoning with executable Python code. Using the CoReTab framework, we curate a dataset of 115K verified samples averaging 529 tokens per response and fine-tune open-source MLLMs through a three-stage pipeline. We evaluate the resulting model trained on CoReTab across 17 MMTab benchmarks spanning table question answering, fact verification, and table structure understanding. Our model achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning· underline

Taxonomy

TopicsTopic Modeling · Handwritten Text Recognition Techniques · Machine Learning in Materials Science