Text Classification using Data Mining
S. M. Kamruzzaman, Farhana Haider, and Ahmed Ryadh Hasan

TL;DR
This paper introduces a novel text classification algorithm that leverages data mining, association rules, Naive Bayes, and genetic algorithms to improve accuracy with fewer training documents.
Contribution
It proposes a new method combining association rules, Naive Bayes, and genetic algorithms for effective text classification requiring less training data.
Findings
System successfully classifies texts with fewer documents
Uses association rules for feature extraction
Integrates Naive Bayes and genetic algorithms
Abstract
Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Data Mining Algorithms and Applications · Imbalanced Data Classification Techniques
