Conjugate Mixture Models for Clustering Multimodal Data
Vasil Khalidov, Florence Forbes, Radu Horaud

TL;DR
This paper introduces conjugate mixture models for multimodal clustering, enabling consistent clustering across different sensor data by leveraging explicit transformations and an EM algorithm, demonstrated on 3D speaker localization.
Contribution
The paper presents a novel conjugate mixture model framework for multimodal clustering, with a tailored EM algorithm and model selection criteria, addressing the challenge of aligning data from different sensors.
Findings
The proposed algorithm converges reliably in multimodal clustering tasks.
The method improves 3D speaker localization accuracy using auditory and visual data.
Multiple initialization and optimization strategies enhance convergence speed.
Abstract
The problem of multimodal clustering arises whenever the data are gathered with several physically different sensors. Observations from different modalities are not necessarily aligned in the sense there there is no obvious way to associate or to compare them in some common space. A solution may consist in considering multiple clustering tasks independently for each modality. The main difficulty with such an approach is to guarantee that the unimodal clusterings are mutually consistent. In this paper we show that multimodal clustering can be addressed within a novel framework, namely conjugate mixture models. These models exploit the explicit transformations that are often available between an unobserved parameter space (objects) and each one of the observation spaces (sensors). We formulate the problem as a likelihood maximization task and we derive the associated conjugate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
