Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's   LLM with Open Source SLMs in Production

Chandra Irugalbandara; Ashish Mahendra; Roland Daynauth; Tharuka; Kasthuri Arachchige; Jayanaka Dantanarayana; Krisztian Flautner; Lingjia; Tang; Yiping Kang; Jason Mars

arXiv:2312.14972·cs.SE·May 22, 2024·1 cites

Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production

Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka, Kasthuri Arachchige, Jayanaka Dantanarayana, Krisztian Flautner, Lingjia, Tang, Yiping Kang, Jason Mars

PDF

Open Access 1 Repo

TL;DR

This paper systematically evaluates open-source small language models (SLMs) as cost-effective and reliable alternatives to proprietary large language models (LLMs) like GPT-4 in real-world applications.

Contribution

It introduces SLaM, an open-source tool for comprehensive evaluation of SLMs, and provides a thorough comparison of SLMs versus GPT-4 in a product setting.

Findings

01

SLMs achieve 5x to 29x cost reduction compared to GPT-4.

02

SLMs show competitive quality and improved performance consistency.

03

Systematic evaluation methodology for SLMs in production environments.

Abstract

Many companies use large language models (LLMs) offered as a service, like OpenAI's GPT-4, to create AI-enabled product experiences. Along with the benefits of ease-of-use and shortened time-to-solution, this reliance on proprietary services has downsides in model control, performance reliability, uptime predictability, and cost. At the same time, a flurry of open-source small language models (SLMs) has been made available for commercial use. However, their readiness to replace existing capabilities remains unclear, and a systematic approach to holistically evaluate these SLMs is not readily available. This paper presents a systematic evaluation methodology and a characterization of modern open-source SLMs and their trade-offs when replacing proprietary LLMs for a real-world product feature. We have designed SLaM, an open-source automated analysis tool that enables the quantitative and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jaseci-labs/slam
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Machine Learning and Data Classification · Scientific Computing and Data Management

Methodstravel james · Attention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dropout · Layer Normalization · Residual Connection