Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure
D S Guru, Mahamad Suhil

TL;DR
This paper introduces a simple, efficient text document categorization method using term-class relevance measures, leveraging a novel weighting scheme and B-tree indexing for fast classification.
Contribution
It proposes a new term-class relevance measure and a quick, logarithmic complexity classification approach that is simpler and faster than existing methods.
Findings
Satisfactory performance on benchmarking datasets
Logarithmic complexity in testing time
Simple implementation compared to existing techniques
Abstract
In this paper, a simple text categorization method using term-class relevance measures is proposed. Initially, text documents are processed to extract significant terms present in them. For every term extracted from a document, we compute its importance in preserving the content of a class through a novel term-weighting scheme known as Term_Class Relevance (TCR) measure proposed by Guru and Suhil (2015) [1]. In this way, for every term, its relevance for all the classes present in the corpus is computed and stored in the knowledgebase. During testing, the terms present in the test document are extracted and the term-class relevance of each term is obtained from the stored knowledgebase. To achieve quick search of term weights, Btree indexing data structure has been adapted. Finally, the class which receives maximum support in terms of term-class relevance is decided to be the class of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning in Bioinformatics · Advanced Text Analysis Techniques
