TECO: Improving Multimodal Intent Recognition with Text Enhancement   through Commonsense Knowledge Extraction

Quynh-Mai Thi Nguyen; Lan-Nhi Thi Nguyen; Cam-Van Thi Nguyen

arXiv:2412.08529·cs.CL·December 12, 2024

TECO: Improving Multimodal Intent Recognition with Text Enhancement through Commonsense Knowledge Extraction

Quynh-Mai Thi Nguyen, Lan-Nhi Thi Nguyen, Cam-Van Thi Nguyen

PDF

Open Access

TL;DR

This paper introduces TECO, a novel method that enhances multimodal intent recognition by extracting and integrating commonsense knowledge to improve textual and non-verbal modality fusion.

Contribution

The paper proposes a new approach, TECO, for enriching textual features with commonsense knowledge and better aligning multimodal data for improved intent recognition.

Findings

01

Significant performance improvements over baseline methods

02

Effective extraction of relations from generated and retrieved knowledge

03

Enhanced fusion of visual, acoustic, and textual modalities

Abstract

The objective of multimodal intent recognition (MIR) is to leverage various modalities-such as text, video, and audio-to detect user intentions, which is crucial for understanding human language and context in dialogue systems. Despite advances in this field, two main challenges persist: (1) effectively extracting and utilizing semantic information from robust textual features; (2) aligning and fusing non-verbal modalities with verbal ones effectively. This paper proposes a Text Enhancement with CommOnsense Knowledge Extractor (TECO) to address these challenges. We begin by extracting relations from both generated and retrieved knowledge to enrich the contextual information in the text modality. Subsequently, we align and integrate visual and acoustic representations with these enhanced text features to form a cohesive multimodal representation. Our experimental results show substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsALIGN