OAEI Machine Learning Dataset for Online Model Generation

Sven Hertling; Ebrahim Norouzi; Harald Sack

arXiv:2404.18542·cs.IR·April 30, 2024

OAEI Machine Learning Dataset for Online Model Generation

Sven Hertling, Ebrahim Norouzi, Harald Sack

PDF

Open Access 1 Repo

TL;DR

This paper introduces a comprehensive dataset for OAEI that enables online model learning, allowing fair comparison of machine learning-based ontology matching systems by facilitating adaptive training and validation.

Contribution

The paper presents a new dataset for OAEI that supports online model learning, addressing fairness issues caused by offline training and subset sampling in existing evaluations.

Findings

01

Dataset enables online adaptation of models

02

Fine-tuning confidence thresholds improves system performance

03

Supports fair comparison of ML-based ontology matchers

Abstract

Ontology and knowledge graph matching systems are evaluated annually by the Ontology Alignment Evaluation Initiative (OAEI). More and more systems use machine learning-based approaches, including large language models. The training and validation datasets are usually determined by the system developer and often a subset of the reference alignments are used. This sampling is against the OAEI rules and makes a fair comparison impossible. Furthermore, those models are trained offline (a trained and optimized model is packaged into the matcher) and therefore the systems are specifically trained for those tasks. In this paper, we introduce a dataset that contains training, validation, and test sets for most of the OAEI tracks. Thus, online model learning (the systems must adapt to the given input alignment without human intervention) is made possible to enable a fair comparison for ML-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dwslab/melt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Processing Techniques · Machine Learning and Data Classification

MethodsOntology