Adaptive Test-Time Scaling for Zero-Shot Respiratory Audio Classification
Tsai-Ning Wang, Herman Teun den Dekker, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed

TL;DR
TRIAGE is a tiered zero-shot framework for respiratory audio classification that adaptively allocates computation, improving accuracy on uncertain cases while maintaining efficiency on easy ones.
Contribution
It introduces a novel adaptive test-time scaling method that routes audio samples through multiple reasoning stages without task-specific training.
Findings
Achieves a mean AUROC of 0.744 across nine tasks without training.
Outperforms prior zero-shot methods and matches supervised baselines.
Uncertain cases see up to 19% relative improvement with test-time scaling.
Abstract
Automated respiratory audio analysis promises scalable, non-invasive disease screening, yet progress is limited by scarce labeled data and costly expert annotation. Zero-shot inference eliminates task-specific supervision, but existing methods apply uniform computation to every input regardless of difficulty. We introduce TRIAGE, a tiered zero-shot framework that adaptively scales test-time compute by routing each audio sample through progressively richer reasoning stages: fast label-cosine scoring in a joint audio-text embedding space (Tier-L), structured matching with clinician-style descriptors (Tier-M), and retrieval-augmented large language model reasoning (Tier-H). A confidence-based router finalizes easy predictions early while allocating additional computation to ambiguous inputs, enabling nearly half of all samples to exit at the cheapest tier. Across nine respiratory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
