Large Language Models (LLMs): Deployment, Tokenomics and Sustainability
Haiwei Dong, Shuang Xie

TL;DR
This paper provides a comprehensive overview of the deployment strategies, economic factors, and sustainability challenges of large language models, emphasizing their operational considerations and future environmental impacts.
Contribution
It offers a detailed analysis of deployment methods, tokenomics, and sustainability issues, including quantitative assessments and future architecture visions for LLMs.
Findings
RAG and fine-tuning have distinct advantages and limitations.
Quantitative analysis of xPU requirements for training and inference.
Discussion on environmental carbon footprint of LLM deployment.
Abstract
The rapid advancement of Large Language Models (LLMs) has significantly impacted human-computer interaction, epitomized by the release of GPT-4o, which introduced comprehensive multi-modality capabilities. In this paper, we first explored the deployment strategies, economic considerations, and sustainability challenges associated with the state-of-the-art LLMs. More specifically, we discussed the deployment debate between Retrieval-Augmented Generation (RAG) and fine-tuning, highlighting their respective advantages and limitations. After that, we quantitatively analyzed the requirement of xPUs in training and inference. Additionally, for the tokenomics of LLM services, we examined the balance between performance and cost from the quality of experience (QoE)'s perspective of end users. Lastly, we envisioned the future hybrid architecture of LLM processing and its corresponding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Data Processing Techniques
