Cost-Aware Model Selection for Text Classification: Multi-Objective Trade-offs Between Fine-Tuned Encoders and LLM Prompting in Production
Alberto Andres Valdes Gonzalez

TL;DR
This paper compares prompt-based large language models and fine-tuned encoders for text classification, highlighting cost, latency, and performance trade-offs to guide operational decision-making in production systems.
Contribution
It provides a systematic, multi-objective evaluation of LLM prompting versus fine-tuned encoders across benchmarks, emphasizing operational efficiency and cost-effectiveness.
Findings
Fine-tuned encoders often outperform LLM prompting in accuracy, cost, and latency.
LLMs are more suitable for hybrid architectures rather than standalone solutions.
Cost-aware evaluation can optimize deployment strategies for NLP models.
Abstract
Large language models (LLMs) such as GPT-4o and Claude Sonnet 4.5 have demonstrated strong capabilities in open-ended reasoning and generative language tasks, leading to their widespread adoption across a broad range of NLP applications. However, for structured text classification problems with fixed label spaces, model selection is often driven by predictive performance alone, overlooking operational constraints encountered in production systems. In this work, we present a systematic comparison of two contrasting paradigms for text classification: zero- and few-shot prompt-based large language models, and fully fine-tuned encoder-only architectures. We evaluate these approaches across four canonical benchmarks (IMDB, SST-2, AG News, and DBPedia), measuring predictive quality (macro F1), inference latency, and monetary cost. We frame model evaluation as a multi-objective decision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Topic Modeling · Artificial Intelligence in Healthcare and Education
