Stalled, Biased, and Confused: Uncovering Reasoning Failures in LLMs for Cloud-Based Root Cause Analysis

Evelien Riddell; James Riddell; Gengyi Sun; Micha{\l} Antkiewicz; Krzysztof Czarnecki

arXiv:2601.22208·cs.SE·February 2, 2026

Stalled, Biased, and Confused: Uncovering Reasoning Failures in LLMs for Cloud-Based Root Cause Analysis

Evelien Riddell, James Riddell, Gengyi Sun, Micha{\l} Antkiewicz, Krzysztof Czarnecki

PDF

Open Access

TL;DR

This paper empirically evaluates the reasoning capabilities of large language models in cloud-based root cause analysis, identifying common failures and providing a taxonomy to guide future improvements.

Contribution

It introduces a controlled experimental framework to isolate LLM reasoning, evaluates multiple models and workflows, and offers a detailed taxonomy of reasoning failures in RCA.

Findings

01

LLMs show specific reasoning failures in multi-hop RCA

02

Sensitivity of LLM performance to input data modalities

03

Identification of reasoning failures that predict correctness

Abstract

Root cause analysis (RCA) is essential for diagnosing failures within complex software systems to ensure system reliability. The highly distributed and interdependent nature of modern cloud-based systems often complicates RCA efforts, particularly for multi-hop fault propagation, where symptoms appear far from their true causes. Recent advancements in Large Language Models (LLMs) present new opportunities to enhance automated RCA. However, their practical value for RCA depends on the fidelity of reasoning and decision-making. Existing work relies on historical incident corpora, operates directly on high-volume telemetry beyond current LLM capacity, or embeds reasoning inside complex multi-agent pipelines -- conditions that obscure whether failures arise from reasoning itself or from peripheral design choices. We present a focused empirical evaluation that isolates an LLM's reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Software Engineering Research · Advanced Software Engineering Methodologies