Schema Matching on Graph: Iterative Graph Exploration for Efficient and Explainable Data Integration
Mingyu Jeon, Jaeyoung Suh, Suwan Cho

TL;DR
This paper presents SMoG, a novel graph-based schema matching framework that uses iterative 1-hop SPARQL queries to improve explainability, efficiency, and reliability in medical data integration tasks.
Contribution
SMoG introduces an iterative, simple-query approach for schema matching that reduces complexity and enhances explainability compared to existing KG-augmented LLM methods.
Findings
Achieves comparable performance to state-of-the-art methods
Reduces storage requirements by querying SPARQL endpoints directly
Enhances explainability through human-verifiable query paths
Abstract
Schema matching is a critical task in data integration, particularly in the medical domain where disparate Electronic Health Record (EHR) systems must be aligned to standard models like OMOP CDM. While Large Language Models (LLMs) have shown promise in schema matching, they suffer from hallucination and lack of up-to-date domain knowledge. Knowledge Graphs (KGs) offer a solution by providing structured, verifiable knowledge. However, existing KG-augmented LLM approaches often rely on inefficient complex multi-hop queries or storage-intensive vector-based retrieval methods. This paper introduces SMoG (Schema Matching on Graph), a novel framework that leverages iterative execution of simple 1-hop SPARQL queries, inspired by successful strategies in Knowledge Graph Question Answering (KGQA). SMoG enhances explainability and reliability by generating human-verifiable query paths while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Advanced Graph Neural Networks · Data Quality and Management
