HIPPO: Enhancing the Table Understanding Capability of LLMs through Hybrid-Modal Preference Optimization

Haolan Wang; Zhenghao Liu; Xinze Li; Xiaocui Yang; Yu Gu; Yukun Yan; Qi Shi; Fangfang Li; Chong Chen; Ge Yu

arXiv:2502.17315·cs.CL·February 17, 2026

HIPPO: Enhancing the Table Understanding Capability of LLMs through Hybrid-Modal Preference Optimization

Haolan Wang, Zhenghao Liu, Xinze Li, Xiaocui Yang, Yu Gu, Yukun Yan, Qi Shi, Fangfang Li, Chong Chen, Ge Yu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

HIPPO introduces a hybrid-modal approach combining text and image data to improve large language models' understanding and reasoning capabilities for tabular data, outperforming existing models.

Contribution

The paper proposes HIPPO, a novel hybrid-modal training method that enhances table understanding in LLMs by learning from combined text and image representations.

Findings

01

Achieves 4% improvement in table reasoning tasks.

02

Enhances extraction of complementary semantics across modalities.

03

Improves unimodal table reasoning capabilities.

Abstract

Tabular data contains rich structural semantics and plays a crucial role in organizing and manipulating information. Recent methods employ Multi-modal Large Language Models (MLLMs) to address table-related tasks across various modalities of table representations. However, existing studies mainly focus on exploring the table understanding ability of MLLMs using unimodal representations, which limits further exploration of multi-modal representations to enable more effective table reasoning. To better capture structural semantics from the tabular data, this paper introduces the HybrId-modal Preference oPtimizatiOn (HIPPO) model, which represents tables using both text and image, optimizing MLLMs by learning more comprehensive table information from these multiple modalities. Specifically, HIPPO samples MLLM responses from hybrid-modal table representations and designs a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neuir/hippo
pytorchOfficial

Datasets

HaolanWang/HIPPO
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsDirect Preference Optimization