TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing

Jongha Kim; Minseong Bae; Sanghyeok Lee; Jinsung Yoon; Hyunwoo J. Kim

arXiv:2511.13283·cs.CV·November 18, 2025

TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing

Jongha Kim, Minseong Bae, Sanghyeok Lee, Jinsung Yoon, Hyunwoo J. Kim

PDF

Open Access 1 Video

TL;DR

TabFlash introduces a novel approach for table image understanding by progressively conditioning questions, pruning background tokens, and focusing on essential tokens, resulting in state-of-the-art performance with reduced computational costs.

Contribution

The paper presents TabFlash, a new multimodal model that enhances table understanding through progressive question conditioning, token pruning, and focusing strategies, improving efficiency and effectiveness.

Findings

01

Achieves state-of-the-art performance on table understanding tasks.

02

Uses 27% fewer FLOPs and 30% less memory than previous models.

03

Effectively reduces redundancy and retains essential information.

Abstract

Table images present unique challenges for effective and efficient understanding due to the need for question-specific focus and the presence of redundant background regions. Existing Multimodal Large Language Model (MLLM) approaches often overlook these characteristics, resulting in uninformative and redundant visual representations. To address these issues, we aim to generate visual features that are both informative and compact to improve table understanding. We first propose progressive question conditioning, which injects the question into Vision Transformer layers with gradually increasing frequency, considering each layer's capacity to handle additional information, to generate question-aware visual features. To reduce redundancy, we introduce a pruning strategy that discards background tokens, thereby improving efficiency. To mitigate information loss from pruning, we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing· underline

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques