A Novel Term_Class Relevance Measure for Text Categorization
D S Guru, Mahamad Suhil

TL;DR
This paper proposes a new Term_Class relevance measure for text categorization that considers term participation across classes, demonstrating improved classification performance on the 20 Newsgroups dataset.
Contribution
It introduces a novel relevance measure combining Class_Term weight and density, improving upon existing schemes like TF-IDF for text classification.
Findings
Outperforms TF-IDF in classification accuracy
Effective on the 20 Newsgroups dataset
Shows significant improvement over existing relevance measures
Abstract
In this paper, we introduce a new measure called Term_Class relevance to compute the relevancy of a term in classifying a document into a particular class. The proposed measure estimates the degree of relevance of a given term, in placing an unlabeled document to be a member of a known class, as a product of Class_Term weight and Class_Term density; where the Class_Term weight is the ratio of the number of documents of the class containing the term to the total number of documents containing the term and the Class_Term density is the relative density of occurrence of the term in the class to the total occurrence of the term in the entire population. Unlike the other existing term weighting schemes such as TF-IDF and its variants, the proposed relevance measure takes into account the degree of relative participation of the term across all documents of the class to the entire population.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
