Deploying Open-Source Large Language Models: A performance Analysis
Yannis Bendi-Ouis, Dan Dutartre, Xavier Hinaut

TL;DR
This paper evaluates the performance of various open-source large language models like Mistral and LLaMa on different hardware setups using vLLM, aiding deployment decisions.
Contribution
It provides a comprehensive performance comparison of open-source LLMs across hardware configurations, facilitating easier deployment decisions.
Findings
Performance varies significantly with hardware and model size.
vLLM effectively optimizes inference for different models.
Guidelines for deploying LLMs based on available resources.
Abstract
Since the release of ChatGPT in November 2022, large language models (LLMs) have seen considerable success, including in the open-source community, with many open-weight models available. However, the requirements to deploy such a service are often unknown and difficult to evaluate in advance. To facilitate this process, we conducted numerous tests at the Centre Inria de l'Universit\'e de Bordeaux. In this article, we propose a comparison of the performance of several models of different sizes (mainly Mistral and LLaMa) depending on the available GPUs, using vLLM, a Python library designed to optimize the inference of these models. Our results provide valuable information for private and public groups wishing to deploy LLMs, allowing them to evaluate the performance of different models based on their available hardware. This study thus contributes to facilitating the adoption and use of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james · Lib
