Text Classification using Data Mining

S. M. Kamruzzaman; Farhana Haider; and Ahmed Ryadh Hasan

arXiv:1009.4987·cs.IR·September 28, 2010·27 cites

Text Classification using Data Mining

S. M. Kamruzzaman, Farhana Haider, and Ahmed Ryadh Hasan

PDF

Open Access

TL;DR

This paper introduces a novel text classification algorithm that leverages data mining, association rules, Naive Bayes, and genetic algorithms to improve accuracy with fewer training documents.

Contribution

It proposes a new method combining association rules, Naive Bayes, and genetic algorithms for effective text classification requiring less training data.

Findings

01

System successfully classifies texts with fewer documents

02

Uses association rules for feature extraction

03

Integrates Naive Bayes and genetic algorithms

Abstract

Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Data Mining Algorithms and Applications · Imbalanced Data Classification Techniques