Re-evaluating the need for Modelling Term-Dependence in Text Classification Problems
Sounak Banerjee, Prasenjit Majumder, Mandar Mitra

TL;DR
This paper critically evaluates the necessity of modeling term dependence in text classification, showing that dependence models offer limited benefits relative to their high computational costs in real-world applications.
Contribution
It provides a comparative analysis demonstrating that dependence models do not significantly outperform independence models, questioning their practicality given resource demands.
Findings
Dependence models perform only marginally better than independence models.
Dependence models require substantially more computational resources.
Independence models are more practical for real-world text classification.
Abstract
A substantial amount of research has been carried out in developing machine learning algorithms that account for term dependence in text classification. These algorithms offer acceptable performance in most cases but they are associated with a substantial cost. They require significantly greater resources to operate. This paper argues against the justification of the higher costs of these algorithms, based on their performance in text classification problems. In order to prove the conjecture, the performance of one of the best dependence models is compared to several well established algorithms in text classification. A very specific collection of datasets have been designed, which would best reflect the disparity in the nature of text data, that are present in real world applications. The results show that even one of the best term dependence models, performs decent at best when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Data Mining Algorithms and Applications
