CogSR: Semantic-Aware Speech Super-Resolution via Chain-of-Thought Guided Flow Matching
Jiajun Yuan, Xiaochen Wang, Yuhang Xiao, Yulin Wu, Chenhao Hu, Xueyang Lv

TL;DR
CogSR is a novel speech super-resolution framework that uses semantic reasoning and acoustic priors to accurately restore severely degraded audio, surpassing existing models in fidelity and intelligibility.
Contribution
It introduces a Chain-of-Thought guided flow matching approach combined with semantic and acoustic priors for high-precision speech restoration.
Findings
Effectively eliminates ambiguity in severely degraded audio
Restores high-frequency details with linguistic accuracy
Robustly improves speech quality in legacy and surveillance recordings
Abstract
Applying speech super-resolution (SR) to recordings with severely low sampling rates is a critical challenge in digital archiving and investigative audio recovery. In these scenarios, the input lacks essential acoustic cues. Consequently, existing generative models often fail; without sufficient context, they hallucinate phonetic content, guessing words based on probability rather than meaning. To address this, we propose CogSR, a framework designed specifically for high-precision, offline restoration. Our approach shifts the focus from simple signal mapping to cognitive reconstruction. By integrating a Large Audio-Language Model, we employ Chain-of-Thought reasoning to act as a semantic anchor, while explicit acoustic priors ensure the speaker's identity remains consistent. This guides a Rectified Flow backbone to synthesize high-frequency details that are not only realistic but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
