wav2graph: A Framework for Supervised Learning Knowledge Graph from   Speech

Khai Le-Duc; Quy-Anh Dang; Tan-Hanh Pham; Truong-Son Hy

arXiv:2408.04174·cs.CL·August 9, 2024

wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech

Khai Le-Duc, Quy-Anh Dang, Tan-Hanh Pham, Truong-Son Hy

PDF

Open Access 1 Repo

TL;DR

This paper introduces wav2graph, a novel framework for constructing and learning knowledge graphs directly from speech data, enabling multimodal reasoning and improving language understanding.

Contribution

It presents the first supervised learning framework for knowledge graphs derived from speech, integrating speech transcription, embedding, and graph neural network training.

Findings

01

Baseline results for node classification and link prediction on speech transcripts.

02

Error analysis highlighting challenges in speech-based knowledge graph learning.

03

Evaluation of different embedding methods and multilingual models.

Abstract

Knowledge graphs (KGs) enhance the performance of large language models (LLMs) and search engines by providing structured, interconnected data that improves reasoning and context-awareness. However, KGs only focus on text data, thereby neglecting other modalities such as speech. In this work, we introduce wav2graph, the first framework for supervised learning knowledge graph from speech data. Our pipeline are straightforward: (1) constructing a KG based on transcribed spoken utterances and a named entity database, (2) converting KG into embedding vectors, and (3) training graph neural networks (GNNs) for node classification and link prediction tasks. Through extensive experiments conducted in inductive and transductive learning contexts using state-of-the-art GNN models, we provide baseline results and error analysis for node classification and link prediction tasks on human transcripts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leduckhai/wav2graph
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsFocus