Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding

Tsai-Ning Wang; Lin-Lin Chen; Neil Zeghidour; Aaqib Saeed

arXiv:2512.04847·cs.SD·April 20, 2026

Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding

Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed

PDF

1 Models

TL;DR

AcuLa is a post-training framework that aligns audio models with medical language models to imbue clinical understanding, significantly improving diagnostic performance in medical audio tasks.

Contribution

The paper introduces AcuLa, a novel method for aligning audio encoders with medical language models using large-scale clinical report generation, enhancing semantic understanding in medical audio analysis.

Findings

01

Achieved state-of-the-art results on 18 cardio-respiratory tasks.

02

Improved AUROC from 0.68 to 0.79 on classification benchmarks.

03

Boosted COVID-19 cough detection AUROC from 0.55 to 0.89.

Abstract

Pre-trained audio models excel at detecting acoustic patterns in auscultation sounds but often fail to grasp their clinical significance, limiting their use and performance in diagnostic tasks. To bridge this gap, we introduce AcuLa (Audio-Clinical Understanding via Language Alignment), a lightweight post-training framework that instills semantic understanding into any audio encoder by aligning it with a medical language model, which acts as a "semantic teacher." To enable alignment at scale, we construct a large-scale dataset by leveraging off-the-shelf large language models to translate the rich, structured metadata accompanying existing audio recordings into coherent clinical reports. Our alignment strategy combines a representation-level contrastive objective with a self-supervised modeling, ensuring that the model learns clinical semantics while preserving fine-grained temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
tsnngw/AcuLa
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.