Latent Code-Based Fusion: A Volterra Neural Network Approach
Sally Ghanem, Siddharth Roheda, and Hamid Krim

TL;DR
This paper introduces a Volterra Neural Network-based deep encoder for multi-modal data fusion, demonstrating improved clustering, sample efficiency, and robustness over traditional CNN auto-encoders.
Contribution
It presents a novel VNN-based auto-encoder architecture that reduces parameter complexity and enhances multi-modal data fusion capabilities.
Findings
Significant improvement in clustering performance over CNN auto-encoders
Enhanced sample efficiency compared to CNN-based auto-encoders
Robust classification performance across datasets
Abstract
We propose a deep structure encoder using the recently introduced Volterra Neural Networks (VNNs) to seek a latent representation of multi-modal data whose features are jointly captured by a union of subspaces. The so-called self-representation embedding of the latent codes leads to a simplified fusion which is driven by a similarly constructed decoding. The Volterra Filter architecture achieved reduction in parameter complexity is primarily due to controlled non-linearities being introduced by the higher-order convolutions in contrast to generalized activation functions. Experimental results on two different datasets have shown a significant improvement in the clustering performance for VNNs auto-encoder over conventional Convolutional Neural Networks (CNNs) auto-encoder. In addition, we also show that the proposed approach demonstrates a much-improved sample complexity over CNN-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Neural Networks and Applications · Animal Vocal Communication and Behavior
