Semantic Content Filtering with Wikipedia and Ontologies
Pekka Malo, Pyry Siitari, Oskar Ahlgren, Jyrki Wallenius and, Pekka Korhonen

TL;DR
This paper presents a framework that combines Wikipedia's semantic relatedness data with domain ontologies to improve document filtering, demonstrating superior performance over traditional machine learning classifiers on standard datasets.
Contribution
It introduces a novel approach integrating Wikipedia and ontologies for semantic content filtering, reducing reliance on manually built domain knowledge sources.
Findings
Outperforms SVM and C4.5 classifiers in filtering tasks
Shows robust performance on Reuters RCV1 and TREC-11 datasets
Utilizes Wikipedia's semantic relatedness to enhance content classification
Abstract
The use of domain knowledge is generally found to improve query efficiency in content filtering applications. In particular, tangible benefits have been achieved when using knowledge-based approaches within more specialized fields, such as medical free texts or legal documents. However, the problem is that sources of domain knowledge are time-consuming to build and equally costly to maintain. As a potential remedy, recent studies on Wikipedia suggest that this large body of socially constructed knowledge can be effectively harnessed to provide not only facts but also accurate information about semantic concept-similarities. This paper describes a framework for document filtering, where Wikipedia's concept-relatedness information is combined with a domain ontology to produce semantic content classifiers. The approach is evaluated using Reuters RCV1 corpus and TREC-11 filtering task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Text and Document Classification Technologies · Topic Modeling
