Are Small Language Models Ready to Compete with Large Language Models   for Practical Applications?

Neelabh Sinha; Vinija Jain; Aman Chadha

arXiv:2406.11402·cs.CL·March 13, 2025·2 cites

Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?

Neelabh Sinha, Vinija Jain, Aman Chadha

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a framework for evaluating small open language models in practical applications, demonstrating that with proper selection, they can outperform some large models like GPT-4o in certain tasks.

Contribution

The work proposes a novel evaluation framework for small LMs across various practical aspects and compares 10 models to identify optimal choices for specific applications.

Findings

01

Small LMs can outperform SOTA large models with proper selection.

02

The framework effectively measures semantic correctness across different tasks and domains.

03

Certain small LMs match or surpass GPT-4o performance in specific scenarios.

Abstract

The rapid rise of Language Models (LMs) has expanded their use in several applications. Yet, due to constraints of model size, associated cost, or proprietary restrictions, utilizing state-of-the-art (SOTA) LLMs is not always feasible. With open, smaller LMs emerging, more applications can leverage their capabilities, but selecting the right LM can be challenging as smaller LMs do not perform well universally. This work tries to bridge this gap by proposing a framework to experimentally evaluate small, open LMs in practical settings through measuring semantic correctness of outputs across three practical aspects: task types, application domains, and reasoning types, using diverse prompt styles. It also conducts an in-depth comparison of 10 small, open LMs to identify the best LM and prompt style depending on specific application requirements using the proposed framework. We also show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neelabhsinha/lm-application-eval-kit
pytorchOfficial

Videos

Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Adam