Optimization Instability in Autonomous Agentic Workflows for Clinical Symptom Detection

Cameron Cagan; Pedram Fard; Jiazi Tian; Jingya Cheng; Shawn N. Murphy; Hossein Estiri

arXiv:2602.16037·cs.AI·February 19, 2026

Optimization Instability in Autonomous Agentic Workflows for Clinical Symptom Detection

Cameron Cagan, Pedram Fard, Jiazi Tian, Jingya Cheng, Shawn N. Murphy, Hossein Estiri

PDF

Open Access

TL;DR

This paper investigates optimization instability in autonomous workflows for clinical symptom detection, revealing failure modes where performance degrades paradoxically and demonstrating that retrospective selection can stabilize and improve results.

Contribution

It characterizes a critical failure mode of autonomous AI systems and shows that retrospective selection outperforms active intervention for stabilization in low-prevalence tasks.

Findings

01

Validation sensitivity oscillates between 1.0 and 0.0 across iterations.

02

At 3% prevalence, the system achieved 95% accuracy with zero positive detections.

03

Retrospective selection outperforms active guiding in preventing catastrophic failure.

Abstract

Autonomous agentic workflows that iteratively refine their own behavior hold considerable promise, yet their failure modes remain poorly characterized. We investigate optimization instability, a phenomenon in which continued autonomous improvement paradoxically degrades classifier performance, using Pythia, an open-source framework for automated prompt optimization. Evaluating three clinical symptoms with varying prevalence (shortness of breath at 23%, chest pain at 12%, and Long COVID brain fog at 3%), we observed that validation sensitivity oscillated between 1.0 and 0.0 across iterations, with severity inversely proportional to class prevalence. At 3% prevalence, the system achieved 95% accuracy while detecting zero positive cases, a failure mode obscured by standard evaluation metrics. We evaluated two interventions: a guiding agent that actively redirected optimization, amplifying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare