Loading paper
Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI | Tomesphere