Inverse-Category-Frequency based supervised term weighting scheme for text categorization
Deqing Wang, Hui Zhang

TL;DR
This paper introduces inverse category frequency (icf) based term weighting schemes for text categorization, demonstrating their superiority or comparability to existing methods through extensive experiments.
Contribution
It proposes novel tf.icf and icf-based supervised term weighting schemes that improve text classification performance.
Findings
Proposed schemes outperform or match existing methods in macro-F1 and micro-F1.
Experiments conducted across multiple classifiers and datasets.
Incorporating icf enhances term weighting effectiveness.
Abstract
Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs. The widely used term weighting scheme in text categorization, i.e., tf.idf, is originated from information retrieval (IR) field. The intuition behind idf for text categorization seems less reasonable than IR. In this paper, we introduce inverse category frequency (icf) into term weighting scheme and propose two novel approaches, i.e., tf.icf and icf-based supervised term weighting schemes. The tf.icf adopts icf to substitute idf factor and favors terms occurring in fewer categories, rather than fewer documents. And the icf-based approach combines icf and relevance frequency (rf) to weight terms in a supervised way. Our cross-classifier and cross-corpus experiments have shown that our proposed approaches are superior or comparable to six supervised term weighting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Web Data Mining and Analysis · Image Retrieval and Classification Techniques
