Improving Persian Document Classification Using Semantic Relations between Words
Saeed Parseh, Ahmad Baraani

TL;DR
This paper introduces a new term weighting method based on semantic relations between words, significantly improving Persian document classification accuracy by 2-4% over previous systems.
Contribution
It proposes a novel semantic relation-based term weighting approach for Persian document classification, enhancing accuracy beyond existing statistical methods.
Findings
Achieved 2-4% improvement in classification accuracy
Validated on three standard Persian corpora
Outperforms previous weighting methods in Persian text classification
Abstract
With the increase of information, document classification as one of the methods of text mining, plays vital role in many management and organizing information. Document classification is the process of assigning a document to one or more predefined category labels. Document classification includes different parts such as text processing, term selection, term weighting and final classification. The accuracy of document classification is very important. Thus improvement in each part of classification should lead to better results and higher precision. Term weighting has a great impact on the accuracy of the classification. Most of the existing weighting methods exploit the statistical information of terms in documents and do not consider semantic relations between words. In this paper, an automated document classification system is presented that uses a novel term weighting method based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Web Data Mining and Analysis
