Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs

Xing Zi; Xinying Zhou; Jinghao Xiao; Catarina Moreira; Mukesh Prasad

arXiv:2603.12458·cs.CL·March 16, 2026

Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs

Xing Zi, Xinying Zhou, Jinghao Xiao, Catarina Moreira, Mukesh Prasad

PDF

Open Access

TL;DR

This paper introduces ShatterMed-QA, a challenging multi-hop medical reasoning benchmark that exposes the reasoning limitations of LLMs and demonstrates that retrieval-augmented methods can significantly improve performance.

Contribution

The paper presents a novel topology-regularized knowledge graph and a multi-hop benchmark for medical reasoning, along with a $k$-Shattering algorithm to remove shortcut biases in LLM evaluation.

Findings

01

LLMs show significant performance drops on multi-hop medical questions.

02

Retrieval-Augmented Generation (RAG) restores near-perfect performance.

03

The benchmark effectively exposes reasoning deficits in current medical LLMs.

Abstract

While Large Language Models (LLMs) achieve expert-level performance on standard medical benchmarks through single-hop factual recall, they severely struggle with the complex, multi-hop diagnostic reasoning required in real-world clinical settings. A primary obstacle is "shortcut learning", where models exploit highly connected, generic hub nodes (e.g., "inflammation") in knowledge graphs to bypass authentic micro-pathological cascades. To address this, we introduce ShatterMed-QA, a bilingual benchmark of 10,558 multi-hop clinical questions designed to rigorously evaluate deep diagnostic reasoning. Our framework constructs a topology-regularized medical Knowledge Graph using a novel $k$ -Shattering algorithm, which physically prunes generic hubs to explicitly sever logical shortcuts. We synthesize the evaluation vignettes by applying implicit bridge entity masking and topology-driven hard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Machine Learning in Healthcare · Topological and Geometric Data Analysis