The Text Classification Pipeline: Starting Shallow going Deeper

Marco Siino; Ilenia Tinnirello; Marco La Cascia

arXiv:2501.00174·cs.CL·April 23, 2025

The Text Classification Pipeline: Starting Shallow going Deeper

Marco Siino, Ilenia Tinnirello, Marco La Cascia

PDF

Open Access

TL;DR

This paper reviews the evolution of text classification, emphasizing the importance of the entire pipeline from shallow to deep models, including traditional methods and modern large language models, to improve NLP tasks.

Contribution

It provides a comprehensive overview of the entire text classification pipeline, integrating traditional and modern deep learning approaches for enhanced understanding.

Findings

01

Deep learning has revolutionized text classification.

02

Large Language Models effectively capture semantic information.

03

A holistic approach improves NLP task performance.

Abstract

Text classification stands as a cornerstone within the realm of Natural Language Processing (NLP), particularly when viewed through computer science and engineering. The past decade has seen deep learning revolutionize text classification, propelling advancements in text retrieval, categorization, information extraction, and summarization. The scholarly literature includes datasets, models, and evaluation criteria, with English being the predominant language of focus, despite studies involving Arabic, Chinese, Hindi, and others. The efficacy of text classification models relies heavily on their ability to capture intricate textual relationships and non-linear correlations, necessitating a comprehensive examination of the entire text classification pipeline. In the NLP domain, a plethora of text representation techniques and model architectures have emerged, with Large Language Models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies