Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
Hanlei Zhang, Hua Xu, Fei Long, Xin Wang, Kai Gao

TL;DR
This paper presents an unsupervised multimodal clustering method (UMC) that effectively leverages nonverbal cues for semantic discovery in multimodal utterances, significantly outperforming existing approaches on benchmark datasets.
Contribution
The paper introduces a novel unsupervised clustering approach with dynamic sample selection and automatic parameter tuning for multimodal data, pioneering in this domain.
Findings
Achieved 2-6% improvements in clustering metrics over state-of-the-art methods.
First successful application of unsupervised multimodal clustering for semantics discovery.
Demonstrated effectiveness on benchmark intent and dialogue act datasets.
Abstract
Discovering the semantics of multimodal utterances is essential for understanding human language and enhancing human-machine interactions. Existing methods manifest limitations in leveraging nonverbal information for discerning complex semantics in unsupervised scenarios. This paper introduces a novel unsupervised multimodal clustering method (UMC), making a pioneering contribution to this field. UMC introduces a unique approach to constructing augmentation views for multimodal data, which are then used to perform pre-training to establish well-initialized representations for subsequent clustering. An innovative strategy is proposed to dynamically select high-quality samples as guidance for representation learning, gauged by the density of each sample's nearest neighbors. Besides, it is equipped to automatically determine the optimal value for the top- parameter in each cluster to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
