Small Language Models in the Real World: Insights from Industrial Text Classification
Lujun Li, Lama Sleem, Niccolo' Gentile, Geoffrey Nichil, Radu State

TL;DR
This paper evaluates the effectiveness and efficiency of small transformer-based language models for industrial text classification tasks, comparing prompt engineering and fine-tuning methods in real-world scenarios.
Contribution
It provides a comprehensive analysis of small models' performance and resource usage, guiding practical deployment in industrial applications.
Findings
Small models can achieve competitive accuracy with proper fine-tuning.
Prompt engineering's effectiveness varies across tasks and models.
Resource efficiency of small models makes them suitable for industrial deployment.
Abstract
With the emergence of ChatGPT, Transformer models have significantly advanced text classification and related tasks. Decoder-only models such as Llama exhibit strong performance and flexibility, yet they suffer from inefficiency on inference due to token-by-token generation, and their effectiveness in text classification tasks heavily depends on prompt quality. Moreover, their substantial GPU resource requirements often limit widespread adoption. Thus, the question of whether smaller language models are capable of effectively handling text classification tasks emerges as a topic of significant interest. However, the selection of appropriate models and methodologies remains largely underexplored. In this paper, we conduct a comprehensive evaluation of prompt engineering and supervised fine-tuning methods for transformer-based text classification. Specifically, we focus on practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Machine Learning and Data Classification
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Focus · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · LLaMA
