TL;DR
This paper introduces LLM4MEM, a novel framework leveraging large language models for multi-table entity matching, addressing semantic inconsistencies, efficiency, and noise issues, with improved F1 scores on multiple datasets.
Contribution
The paper proposes a new LLM-based approach for multi-table entity matching, including modules for semantic consistency, efficiency, and noise reduction, demonstrating significant performance gains.
Findings
Achieved an average of 5.1% F1 improvement over baselines.
Developed a multi-style prompt-enhanced LLM attribute coordination module.
Implemented a density-aware pruning module for noise reduction.
Abstract
Multi-table entity matching (MEM) addresses the limitations of dual-table approaches by enabling simultaneous identification of equivalent entities across multiple data sources without unique identifiers. However, existing methods relying on pre-trained language models struggle to handle semantic inconsistencies caused by numerical attribute variations. Inspired by the powerful language understanding capabilities of large language models (LLMs), we propose a novel LLM-based framework for multi-table entity matching, termed LLM4MEM. Specifically, we first propose a multi-style prompt-enhanced LLM attribute coordination module to address semantic inconsistencies. Then, to alleviate the matching efficiency problem caused by the surge in the number of entities brought by multiple data sources, we develop a transitive consensus embedding matching module to tackle entity embedding and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
