Improving LLM-based Ontology Matching with fine-tuning on synthetic data
Guilherme Sousa, Rinaldo Lima, Cassia Trojahn

TL;DR
This paper presents a novel approach that uses synthetic data generation and fine-tuning of Large Language Models to improve their performance in ontology matching tasks, demonstrating significant gains on multiple datasets.
Contribution
It introduces a new method for automatically generating training data and fine-tuning LLMs specifically for ontology matching, enhancing zero-shot performance.
Findings
Fine-tuned LLMs outperform base models on ontology matching datasets.
Synthetic dataset generation effectively adapts LLMs for domain-specific tasks.
The approach improves matching accuracy across diverse ontology datasets.
Abstract
Large Language Models (LLMs) are increasingly being integrated into various components of Ontology Matching pipelines. This paper investigates the capability of LLMs to perform ontology matching directly on ontology modules and generate the corresponding alignments. Furthermore, it is explored how a dedicated fine-tuning strategy can enhance the model's matching performance in a zero-shot setting. The proposed method incorporates a search space reduction technique to select relevant subsets from both source and target ontologies, which are then used to automatically construct prompts. Recognizing the scarcity of reference alignments for training, a novel LLM-based approach is introduced for generating a synthetic dataset. This process creates a corpus of ontology submodule pairs and their corresponding reference alignments, specifically designed to fine-tune an LLM for the ontology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Environmental Monitoring and Data Management · Biomedical Text Mining and Ontologies
