Domain Knowledge in Artificial Intelligence: Using Conceptual Modeling to Increase Machine Learning Accuracy and Explainability

V.C. Storey; J. Parsons; A. Castellanos; M. Tremblay; R. Lukyanenko; W. Maass; A. Castillo

arXiv:2507.02922·cs.LG·July 8, 2025

Domain Knowledge in Artificial Intelligence: Using Conceptual Modeling to Increase Machine Learning Accuracy and Explainability

V.C. Storey, J. Parsons, A. Castellanos, M. Tremblay, R. Lukyanenko, W. Maass, A. Castillo

PDF

TL;DR

This paper introduces CMML, a method leveraging domain knowledge via conceptual modeling to enhance data preparation, thereby improving machine learning accuracy and transparency in real-world applications.

Contribution

The paper presents a novel CMML approach that integrates conceptual modeling into data preparation for machine learning, addressing performance and explainability issues.

Findings

01

CMML improved model performance in two real-world case studies.

02

Data scientists found CMML applicable and beneficial for data preparation.

03

Enhanced transparency and understanding of machine learning models.

Abstract

Machine learning enables the extraction of useful information from large, diverse datasets. However, despite many successful applications, machine learning continues to suffer from performance and transparency issues. These challenges can be partially attributed to the limited use of domain knowledge by machine learning models. This research proposes using the domain knowledge represented in conceptual models to improve the preparation of the data used to train machine learning models. We develop and demonstrate a method, called the Conceptual Modeling for Machine Learning (CMML), which is comprised of guidelines for data preparation in machine learning and based on conceptual modeling constructs and principles. To assess the impact of CMML on machine learning outcomes, we first applied it to two real-world problems to evaluate its impact on model performance. We then solicited an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.