Medical Documents Classification Based on the Domain Ontology MeSH
Zakaria Elberrichi, Belaggoun Amel, Taibi Malika

TL;DR
This paper proposes a novel method for classifying medical documents by leveraging the MeSH domain ontology to enhance document representation, resulting in significantly improved classification accuracy over traditional stem-based methods.
Contribution
It introduces a new ontology-based representation technique for medical document classification using MeSH, tested with C4.5 and KNN algorithms, showing substantial performance gains.
Findings
Ontology-based representation improved classification accuracy by 30%.
Enrichment with concepts and hyperonyms enhances document vector quality.
The approach outperforms traditional stem-based methods on biomedical datasets.
Abstract
This paper addresses the problem of classifying web documents using domain ontology. Our goal is to provide a method for improving the classification of medical documents by exploiting the MeSH thesaurus (Medical Subject Headings) which will allow us to generate a new representation based on concepts. This approach was tested with two well-known data mining algorithms C4.5 and KNN, and a comparison was made with the usual representation using stems. The enrichment of vectors using the concepts and the hyperonyms drawn from the domain ontology has significantly boosted their representation, something essential for good classification. The results of our experiments on the benchmark biomedical collection Ohsumed confirm the importance of the approach by a very significant improvement in the performance of the ontology-based classification compared to the classical representation (Stems)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques
