Machine Learning in Automated Text Categorization

Fabrizio Sebastiani

arXiv:cs/0110053·cs.IR·September 21, 2021

Machine Learning in Automated Text Categorization

Fabrizio Sebastiani

PDF

TL;DR

This paper surveys machine learning methods for automated text categorization, highlighting their advantages over manual approaches and discussing key issues in document representation, classifier construction, and evaluation.

Contribution

It provides a comprehensive overview of machine learning techniques applied to text classification, emphasizing their effectiveness and adaptability across domains.

Findings

01

Machine learning approaches outperform manual classification methods.

02

Effective document representation is crucial for accurate categorization.

03

Evaluation metrics are essential for assessing classifier performance.

Abstract

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.