Green Shielding: A User-Centric Approach Towards Trustworthy AI

Aaron J. Li; Nicolas Sanchez; Hao Huang; Ruijiang Dong; Jaskaran Bains; Katrin Jaradeh; Zhen Xiang; Bo Li; Feng Liu; Aaron Kornblith; Bin Yu

arXiv:2604.24700·cs.CL·April 28, 2026

Green Shielding: A User-Centric Approach Towards Trustworthy AI

Aaron J. Li, Nicolas Sanchez, Hao Huang, Ruijiang Dong, Jaskaran Bains, Katrin Jaradeh, Zhen Xiang, Bo Li, Feng Liu, Aaron Kornblith, Bin Yu

PDF

TL;DR

Green Shielding introduces a user-centric framework for evaluating and guiding the deployment of trustworthy AI, focusing on how routine input variations affect model behavior, especially in high-stakes medical diagnosis.

Contribution

The paper proposes the Green Shielding agenda and the CUE criteria, operationalized through the HCM-Dx benchmark, to systematically study benign input variations and their impact on LLMs in medical diagnosis.

Findings

01

Prompt-level factors cause meaningful shifts in model behavior.

02

Neutralization improves plausibility but reduces coverage of critical conditions.

03

Tradeoffs exist between model plausibility, conciseness, and safety-critical coverage.

Abstract

Large language models (LLMs) are increasingly deployed, yet their outputs can be highly sensitive to routine, non-adversarial variation in how users phrase queries, a gap not well addressed by existing red-teaming efforts. We propose Green Shielding, a user-centric agenda for building evidence-backed deployment guidance by characterizing how benign input variation shifts model behavior. We operationalize this agenda through the CUE criteria: benchmarks with authentic Context, reference standards and metrics that capture true Utility, and perturbations that reflect realistic variations in the Elicitation of model behavior. Guided by the PCS framework and developed with practicing physicians, we instantiate Green Shielding in medical diagnosis through HealthCareMagic-Diagnosis (HCM-Dx), a benchmark of patient-authored queries, together with structured reference diagnosis sets and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.