Embeddings to Diagnosis: Latent Fragility under Agentic Perturbations in Clinical LLMs
Raj Krishnan Vijayaraj

TL;DR
This paper introduces LAPD, a geometry-aware framework to evaluate the latent robustness of clinical LLMs under structured adversarial perturbations, revealing latent fragility not detected by standard metrics.
Contribution
We propose LAPD and LDFR to systematically diagnose latent instability in clinical LLMs, highlighting vulnerabilities under minimal input changes.
Findings
Latent fragility exists even with minor input modifications.
LDFR effectively captures representational instability.
Findings generalize to real clinical notes from the DiReCT benchmark.
Abstract
LLMs for clinical decision support often fail under small but clinically meaningful input shifts such as masking a symptom or negating a finding, despite high performance on static benchmarks. These reasoning failures frequently go undetected by standard NLP metrics, which are insensitive to latent representation shifts that drive diagnosis instability. We propose a geometry-aware evaluation framework, LAPD (Latent Agentic Perturbation Diagnostics), which systematically probes the latent robustness of clinical LLMs under structured adversarial edits. Within this framework, we introduce Latent Diagnosis Flip Rate (LDFR), a model-agnostic diagnostic signal that captures representational instability when embeddings cross decision boundaries in PCA-reduced latent space. Clinical notes are generated using a structured prompting pipeline grounded in diagnostic reasoning, then perturbed along…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
