Unlocking the Power of Large Language Models for Multi-table Entity Matching

Yingkai Tang; Taoyu Su; Wenyuan Zhang; Xiaoyang Guo; Tingwen Liu

arXiv:2604.21238·cs.CL·April 24, 2026

Unlocking the Power of Large Language Models for Multi-table Entity Matching

Yingkai Tang, Taoyu Su, Wenyuan Zhang, Xiaoyang Guo, Tingwen Liu

PDF

1 Repo

TL;DR

This paper introduces LLM4MEM, a novel framework leveraging large language models for multi-table entity matching, addressing semantic inconsistencies, efficiency, and noise issues, with improved F1 scores on multiple datasets.

Contribution

The paper proposes a new LLM-based approach for multi-table entity matching, including modules for semantic consistency, efficiency, and noise reduction, demonstrating significant performance gains.

Findings

01

Achieved an average of 5.1% F1 improvement over baselines.

02

Developed a multi-style prompt-enhanced LLM attribute coordination module.

03

Implemented a density-aware pruning module for noise reduction.

Abstract

Multi-table entity matching (MEM) addresses the limitations of dual-table approaches by enabling simultaneous identification of equivalent entities across multiple data sources without unique identifiers. However, existing methods relying on pre-trained language models struggle to handle semantic inconsistencies caused by numerical attribute variations. Inspired by the powerful language understanding capabilities of large language models (LLMs), we propose a novel LLM-based framework for multi-table entity matching, termed LLM4MEM. Specifically, we first propose a multi-style prompt-enhanced LLM attribute coordination module to address semantic inconsistencies. Then, to alleviate the matching efficiency problem caused by the surge in the number of entities brought by multiple data sources, we develop a transitive consensus embedding matching module to tackle entity embedding and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ymeki/LLM4MEM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.