Cost-Aware Model Selection for Text Classification: Multi-Objective Trade-offs Between Fine-Tuned Encoders and LLM Prompting in Production

Alberto Andres Valdes Gonzalez

arXiv:2602.06370·cs.CL·February 9, 2026

Cost-Aware Model Selection for Text Classification: Multi-Objective Trade-offs Between Fine-Tuned Encoders and LLM Prompting in Production

Alberto Andres Valdes Gonzalez

PDF

Open Access

TL;DR

This paper compares prompt-based large language models and fine-tuned encoders for text classification, highlighting cost, latency, and performance trade-offs to guide operational decision-making in production systems.

Contribution

It provides a systematic, multi-objective evaluation of LLM prompting versus fine-tuned encoders across benchmarks, emphasizing operational efficiency and cost-effectiveness.

Findings

01

Fine-tuned encoders often outperform LLM prompting in accuracy, cost, and latency.

02

LLMs are more suitable for hybrid architectures rather than standalone solutions.

03

Cost-aware evaluation can optimize deployment strategies for NLP models.

Abstract

Large language models (LLMs) such as GPT-4o and Claude Sonnet 4.5 have demonstrated strong capabilities in open-ended reasoning and generative language tasks, leading to their widespread adoption across a broad range of NLP applications. However, for structured text classification problems with fixed label spaces, model selection is often driven by predictive performance alone, overlooking operational constraints encountered in production systems. In this work, we present a systematic comparison of two contrasting paradigms for text classification: zero- and few-shot prompt-based large language models, and fully fine-tuned encoder-only architectures. We evaluate these approaches across four canonical benchmarks (IMDB, SST-2, AG News, and DBPedia), measuring predictive quality (macro F1), inference latency, and monetary cost. We frame model evaluation as a multi-objective decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Topic Modeling · Artificial Intelligence in Healthcare and Education