HighlightBench: Benchmarking Markup-Driven Table Reasoning in Scientific Documents

Lexin Wang; Shenghua Liu; Yiwei Wang; Yujun Cai; Yuyao Ge; Jiayu Yao; Jiafeng Guo; Xueqi Cheng

arXiv:2603.26784·cs.CV·March 31, 2026

HighlightBench: Benchmarking Markup-Driven Table Reasoning in Scientific Documents

Lexin Wang, Shenghua Liu, Yiwei Wang, Yujun Cai, Yuyao Ge, Jiayu Yao, Jiafeng Guo, Xueqi Cheng

PDF

TL;DR

HighlightBench is a new benchmark designed to evaluate how well multimodal language models understand and reason with visual markups in scientific tables, revealing their limitations in handling explicit visual cues.

Contribution

The paper introduces HighlightBench, a diagnostic benchmark with a reference pipeline for detailed evaluation of markup-driven table understanding in multimodal models.

Findings

01

Strong models show instability when reasoning with visual cues.

02

Benchmark decomposes tasks into five families for detailed analysis.

03

Reproducible baselines and error attribution are enabled.

Abstract

Visual markups such as highlights, underlines, and bold text are common in table-centric documents. Although multimodal large language models (MLLMs) have made substantial progress in document understanding, their ability to treat such cues as explicit logical directives remains under-explored. More importantly, existing evaluations cannot distinguish whether a model fails to see the markup or fails to reason with it. This creates a key blind spot in assessing markup-conditioned behavior over tables. To address this gap, we introduce HighlightBench, a diagnostic benchmark for markup-driven table understanding that decomposes evaluation into five task families: Markup Grounding, Constrained Retrieval, Local Relations, Aggregation \& Comparison, and Consistency \& Missingness. We further provide a reference pipeline that makes intermediate decisions explicit, enabling reproducible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.