The Text Classification Pipeline: Starting Shallow going Deeper
Marco Siino, Ilenia Tinnirello, Marco La Cascia

TL;DR
This paper reviews the evolution of text classification, emphasizing the importance of the entire pipeline from shallow to deep models, including traditional methods and modern large language models, to improve NLP tasks.
Contribution
It provides a comprehensive overview of the entire text classification pipeline, integrating traditional and modern deep learning approaches for enhanced understanding.
Findings
Deep learning has revolutionized text classification.
Large Language Models effectively capture semantic information.
A holistic approach improves NLP task performance.
Abstract
Text classification stands as a cornerstone within the realm of Natural Language Processing (NLP), particularly when viewed through computer science and engineering. The past decade has seen deep learning revolutionize text classification, propelling advancements in text retrieval, categorization, information extraction, and summarization. The scholarly literature includes datasets, models, and evaluation criteria, with English being the predominant language of focus, despite studies involving Arabic, Chinese, Hindi, and others. The efficacy of text classification models relies heavily on their ability to capture intricate textual relationships and non-linear correlations, necessitating a comprehensive examination of the entire text classification pipeline. In the NLP domain, a plethora of text representation techniques and model architectures have emerged, with Large Language Models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
