ConceptBeam: Concept Driven Target Speech Extraction
Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki, Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, and Kunio Kashino

TL;DR
ConceptBeam introduces a semantic, concept-driven approach to target speech extraction, enabling focus on topics of interest using shared embedding spaces, outperforming traditional methods based on acoustic properties or keywords.
Contribution
The paper presents a novel semantic embedding framework for target speech extraction that leverages modality-independent representations to focus on concepts rather than keywords or acoustic cues.
Findings
Outperforms keyword-based and sound source separation methods
Effective in extracting speech based on semantic concept representations
Demonstrates robustness across different concept specifiers
Abstract
We propose a novel framework for target speech extraction based on semantic information, called ConceptBeam. Target speech extraction means extracting the speech of a target speaker in a mixture. Typical approaches have been exploiting properties of audio signals, such as harmonic structure and direction of arrival. In contrast, ConceptBeam tackles the problem with semantic clues. Specifically, we extract the speech of speakers speaking about a concept, i.e., a topic of interest, using a concept specifier such as an image or speech. Solving this novel problem would open the door to innovative applications such as listening systems that focus on a particular topic discussed in a conversation. Unlike keywords, concepts are abstract notions, making it challenging to directly represent a target concept. In our scheme, a concept is encoded as a semantic embedding by mapping the concept…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
