Symbol Grounding Association in Multimodal Sequences with Missing Elements
Federico Raue, Andreas Dengel, Thomas M. Breuel, Marcus Liwicki

TL;DR
This paper introduces an extension to a symbolic association framework that effectively handles missing elements in multimodal sequences, improving alignment and association accuracy in scenarios with incomplete data.
Contribution
The work extends a recent multimodal sequence association approach to cope with missing elements using dual LSTMs, EM-based learning, and DTW alignment, enhancing robustness.
Findings
Outperforms the original model in missing element scenarios
Achieves results comparable to individual modality-specific LSTMs
Demonstrates robustness in multimodal sequence association with incomplete data
Abstract
In this paper, we extend a symbolic association framework for being able to handle missing elements in multimodal sequences. The general scope of the work is the symbolic associations of object-word mappings as it happens in language development in infants. In other words, two different representations of the same abstract concepts can associate in both directions. This scenario has been long interested in Artificial Intelligence, Psychology, and Neuroscience. In this work, we extend a recent approach for multimodal sequences (visual and audio) to also cope with missing elements in one or both modalities. Our method uses two parallel Long Short-Term Memories (LSTMs) with a learning rule based on EM-algorithm. It aligns both LSTM outputs via Dynamic Time Warping (DTW). We propose to include an extra step for the combination with the max operation for exploiting the common elements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
