Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs

Guangba Yu; Zirui Wang; Yujie Huang; Renyi Zhong; Yuedong Zhong; Yilun Wang; Michael R. Lyu

arXiv:2601.13655·cs.SE·January 21, 2026

Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs

Guangba Yu, Zirui Wang, Yujie Huang, Renyi Zhong, Yuedong Zhong, Yilun Wang, Michael R. Lyu

PDF

Open Access

TL;DR

This study empirically analyzes 705 real-world failures in open-source LLMs, revealing systemic fragility in deployment stacks and providing insights to improve reliability in user-managed environments.

Contribution

It is the first large-scale empirical investigation into open-source LLM failures, highlighting systemic issues in deployment and offering actionable guidance for robustness.

Findings

01

Runtime crashes indicate infrastructure issues

02

Incorrect outputs often stem from tokenizer defects

03

Reliability barriers are ecosystem-wide, not architecture-specific

Abstract

The democratization of open-source Large Language Models (LLMs) allows users to fine-tune and deploy models on local infrastructure but exposes them to a First Mile deployment landscape. Unlike black-box API consumption, the reliability of user-managed orchestration remains a critical blind spot. To bridge this gap, we conduct the first large-scale empirical study of 705 real-world failures from the open-source DeepSeek, Llama, and Qwen ecosystems. Our analysis reveals a paradigm shift: white-box orchestration relocates the reliability bottleneck from model algorithmic defects to the systemic fragility of the deployment stack. We identify three key phenomena: (1) Diagnostic Divergence: runtime crashes distinctively signal infrastructure friction, whereas incorrect functionality serves as a signature for internal tokenizer defects. (2) Systemic Homogeneity: Root causes converge across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Scientific Computing and Data Management · Artificial Intelligence in Healthcare and Education