Balancing Between Over-Weighting and Under-Weighting in Supervised Term Weighting
Haibing Wu, Xiaodong Gu

TL;DR
This paper introduces a balanced supervised term weighting scheme that controls over-weighting and under-weighting through regularization techniques and entropy-based measures, improving text classification performance.
Contribution
It proposes a novel regularized entropy (re) scheme for supervised term weighting, addressing over- and under-weighting issues with new regularization methods.
Findings
Regularization techniques significantly impact performance.
Re achieves superior results compared to existing schemes.
Balancing weighting improves classification accuracy.
Abstract
Supervised term weighting could improve the performance of text categorization. A way proven to be effective is to give more weight to terms with more imbalanced distributions across categories. This paper shows that supervised term weighting should not just assign large weights to imbalanced terms, but should also control the trade-off between over-weighting and under-weighting. Over-weighting, a new concept proposed in this paper, is caused by the improper handling of singular terms and too large ratios between term weights. To prevent over-weighting, we present three regularization techniques: add-one smoothing, sublinear scaling and bias term. Add-one smoothing is used to handle singular terms. Sublinear scaling and bias term shrink the ratios between term weights. However, if sublinear functions scale down term weights too much, or the bias term is too large, under-weighting would…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Text and Document Classification Technologies · Advanced Text Analysis Techniques
