Multi-scale Geometric Summaries for Similarity-based Sensor Fusion
Christopher J. Tralie, Paul Bendich, John Harer

TL;DR
This paper introduces a general, training-free method for sensor data fusion using wavelet-based summaries of self-similarity matrices, effectively differentiating multimodal sequences like speech even in noisy conditions.
Contribution
It develops a novel fusion pipeline combining similarity network fusion and scattering transform for multimodal sequence differentiation without domain-specific knowledge or training.
Findings
Outperforms unsupervised raw data techniques.
Surpasses modality-specific SSM methods.
Maintains effectiveness in low SNR scenarios.
Abstract
In this work, we address fusion of heterogeneous sensor data using wavelet-based summaries of fused self-similarity information from each sensor. The technique we develop is quite general, does not require domain specific knowledge or physical models, and requires no training. Nonetheless, it can perform surprisingly well at the general task of differentiating classes of time-ordered behavior sequences which are sensed by more than one modality. As a demonstration of our capabilities in the audio to video context, we focus on the differentiation of speech sequences. Data from two or more modalities first are represented using self-similarity matrices(SSMs) corresponding to time-ordered point clouds in feature spaces of each of these data sources; we note that these feature spaces can be of entirely different scale and dimensionality. A fused similarity template is then derived from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
