CogSR: Semantic-Aware Speech Super-Resolution via Chain-of-Thought Guided Flow Matching

Jiajun Yuan; Xiaochen Wang; Yuhang Xiao; Yulin Wu; Chenhao Hu; Xueyang Lv

arXiv:2512.16304·cs.SD·December 19, 2025

CogSR: Semantic-Aware Speech Super-Resolution via Chain-of-Thought Guided Flow Matching

Jiajun Yuan, Xiaochen Wang, Yuhang Xiao, Yulin Wu, Chenhao Hu, Xueyang Lv

PDF

Open Access

TL;DR

CogSR is a novel speech super-resolution framework that uses semantic reasoning and acoustic priors to accurately restore severely degraded audio, surpassing existing models in fidelity and intelligibility.

Contribution

It introduces a Chain-of-Thought guided flow matching approach combined with semantic and acoustic priors for high-precision speech restoration.

Findings

01

Effectively eliminates ambiguity in severely degraded audio

02

Restores high-frequency details with linguistic accuracy

03

Robustly improves speech quality in legacy and surveillance recordings

Abstract

Applying speech super-resolution (SR) to recordings with severely low sampling rates is a critical challenge in digital archiving and investigative audio recovery. In these scenarios, the input lacks essential acoustic cues. Consequently, existing generative models often fail; without sufficient context, they hallucinate phonetic content, guessing words based on probability rather than meaning. To address this, we propose CogSR, a framework designed specifically for high-precision, offline restoration. Our approach shifts the focus from simple signal mapping to cognitive reconstruction. By integrating a Large Audio-Language Model, we employ Chain-of-Thought reasoning to act as a semantic anchor, while explicit acoustic priors ensure the speaker's identity remains consistent. This guides a Rectified Flow backbone to synthesize high-frequency details that are not only realistic but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques