DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition

HongYu Liu; Junxin Li; Changxi Guo; Hao Chen; Yaqian Huang; Yifu Guo; Huan Yang; and Lihua Cai

arXiv:2511.11000·cs.SD·November 18, 2025

DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition

HongYu Liu, Junxin Li, Changxi Guo, Hao Chen, Yaqian Huang, Yifu Guo, Huan Yang, and Lihua Cai

PDF

Open Access

TL;DR

DialogGraph-LLM introduces a novel graph-informed end-to-end framework combining multimodal foundation models and semi-supervised learning to improve speaker intent recognition in complex audio dialogues with limited labeled data.

Contribution

It presents a new Multi-Relational Dialogue Attention Network architecture integrated with foundation models and a confidence-aware semi-supervised learning strategy for audio dialogue intent recognition.

Findings

01

Outperforms strong audio and text baselines on proprietary and public datasets.

02

Demonstrates high accuracy and efficiency in real-world audio dialogue scenarios.

03

Effective semi-supervised learning reduces the need for extensive labeled data.

Abstract

Recognizing speaker intent in long audio dialogues among speakers has a wide range of applications, but is a non-trivial AI task due to complex inter-dependencies in speaker utterances and scarce annotated data. To address these challenges, an end-to-end framework, namely DialogGraph-LLM, is proposed in the current work. DialogGraph-LLM combines a novel Multi-Relational Dialogue Attention Network (MR-DAN) architecture with multimodal foundation models (e.g., Qwen2.5-Omni-7B) for direct acoustic-to-intent inference. An adaptive semi-supervised learning strategy is designed using LLM with a confidence-aware pseudo-label generation mechanism based on dual-threshold filtering using both global and class confidences, and an entropy-based sample selection process that prioritizes high-information unlabeled instances. Extensive evaluations on the proprietary MarketCalls corpus and the publicly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech and dialogue systems · Speech Recognition and Synthesis