Cascaded Cross-Modal Transformer for Request and Complaint Detection

Nicolae-Catalin Ristea; Radu Tudor Ionescu

arXiv:2307.15097·cs.CL·July 31, 2023

Cascaded Cross-Modal Transformer for Request and Complaint Detection

Nicolae-Catalin Ristea, Radu Tudor Ionescu

PDF

Open Access

TL;DR

This paper introduces a cascaded cross-modal transformer that combines speech and text data to improve detection of customer requests and complaints in phone conversations, achieving high recall rates.

Contribution

The paper presents a novel cascaded cross-modal transformer model that integrates speech and text modalities for improved request and complaint detection.

Findings

01

Achieved 65.41% UAR for complaints.

02

Achieved 85.87% UAR for requests.

03

Effective multimodal approach for customer interaction analysis.

Abstract

We propose a novel cascaded cross-modal transformer (CCMT) that combines speech and text transcripts to detect customer requests and complaints in phone conversations. Our approach leverages a multimodal paradigm by transcribing the speech using automatic speech recognition (ASR) models and translating the transcripts into different languages. Subsequently, we combine language-specific BERT-based models with Wav2Vec2.0 audio features in a novel cascaded cross-attention transformer model. We apply our system to the Requests Sub-Challenge of the ACM Multimedia 2023 Computational Paralinguistics Challenge, reaching unweighted average recalls (UAR) of 65.41% and 85.87% for the complaint and request classes, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPublic Relations and Crisis Communication · Sentiment Analysis and Opinion Mining · Speech and dialogue systems