Information filtering based on wiki index database
A. V. Smirnov, A. A. Krizhanovsky

TL;DR
This paper introduces a profile-based information filtering method utilizing a Wikipedia index database to classify texts into user-specific topics, enhancing filtering accuracy for emails and news.
Contribution
It presents a novel approach that automatically generates user profiles and topic-specific Wikipedia subcorpora for improved information filtering.
Findings
Effective classification of texts into user interests
Successful application to Russian and Simple English Wikipedia
Demonstrated potential for personalized information filtering
Abstract
In this paper we present a profile-based approach to information filtering by an analysis of the content of text documents. The Wikipedia index database is created and used to automatically generate the user profile from the user document collection. The problem-oriented Wikipedia subcorpora are created (using knowledge extracted from the user profile) for each topic of user interests. The index databases of these subcorpora are applied to filtering information flow (e.g., mails, news). Thus, the analyzed texts are classified into several topics explicitly presented in the user profile. The paper concentrates on the indexing part of the approach. The architecture of an application implementing the Wikipedia indexing is described. The indexing method is evaluated using the Russian and Simple English Wikipedia.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Natural Language Processing Techniques · Topic Modeling
