Machine learning approach for text and document mining

Vishwanath Bijalwan; Pinki Kumari; Jordan Pascual; Vijay Bhaskar; Semwal

arXiv:1406.1580·cs.IR·June 9, 2014·37 cites

Machine learning approach for text and document mining

Vishwanath Bijalwan, Pinki Kumari, Jordan Pascual, Vijay Bhaskar, Semwal

PDF

Open Access

TL;DR

This paper explores machine learning techniques, specifically KNN, for text categorization and document retrieval, highlighting their effectiveness in classifying documents into predefined categories.

Contribution

It introduces a KNN-based approach for text categorization and document retrieval, combining information retrieval tools with machine learning methods.

Findings

01

KNN effectively classifies documents into categories.

02

The approach improves document retrieval relevance.

03

The method demonstrates practical applicability in text mining.

Abstract

Text Categorization (TC), also known as Text Classification, is the task of automatically classifying a set of text documents into different categories from a predefined set. If a document belongs to exactly one of the categories, it is a single-label classification task; otherwise, it is a multi-label classification task. TC uses several tools from Information Retrieval (IR) and Machine Learning (ML) and has received much attention in the last years from both researchers in the academia and industry developers. In this paper, we first categorize the documents using KNN based machine learning approach and then return the most relevant documents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Spam and Phishing Detection · Algorithms and Data Compression