TL;DR
This paper introduces CNN-based multimodal subspace clustering methods that fuse multiple data modalities into a shared latent space, enforce self-expressiveness, and outperform existing techniques on multiple datasets.
Contribution
It proposes novel multimodal fusion strategies and a unified framework for subspace clustering using deep neural networks, advancing the state-of-the-art in multimodal data analysis.
Findings
Significant performance improvements over existing methods.
Effective fusion strategies for multimodal data.
Robust clustering across diverse datasets.
Abstract
We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder. The encoder takes multimodal data as input and fuses them to a latent space representation. The self-expressive layer is responsible for enforcing the self-expressiveness property and acquiring an affinity matrix corresponding to the data points. The decoder reconstructs the original input data. The network uses the distance between the decoder's reconstruction and the original input in its training. We investigate early, late and intermediate fusion techniques and propose three different encoders corresponding to them for spatial fusion. The self-expressive layers and multimodal decoders are essentially the same for different spatial fusion-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
