Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks
Adit Krishnan, Chu Wang, Chris Kong

TL;DR
This paper introduces a token-driven sparse finetuning method for small language models, enhancing specialized semantic classification tasks by focusing on relevant tokens without adding extra parameters, outperforming existing methods.
Contribution
The work presents a novel token-based sparse finetuning strategy that improves efficiency and performance for specialized classification tasks without increasing model complexity.
Findings
Outperforms end-to-end finetuning, LoRA, layer selection, and prefix tuning.
Achieves greater stability and halves training costs.
Effective across five diverse semantic classification tasks.
Abstract
Semantic text classification requires the understanding of the contextual significance of specific tokens rather than surface-level patterns or keywords (as in rule-based or statistical text classification), making large language models (LLMs) well-suited for this task. However, semantic classification applications in industry, like customer intent detection or semantic role labeling, tend to be highly specialized. They require annotation by domain experts in contrast to general-purpose corpora for pretraining. Further, they typically require high inference throughputs which limits the model size from latency and cost perspectives. Thus, for a range of specialized classification tasks, the preferred solution is to develop customized classifiers by finetuning smaller language models (e.g., mini-encoders, small language models). In this work, we develop a token-driven sparse finetuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining
