Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition
Gorjan Radevski

TL;DR
This paper presents new multimodal learning techniques across various domains, including spatial reasoning, medical text mapping, knowledge graph linking, and action recognition, to improve machine understanding of complex inputs.
Contribution
It introduces novel methods and benchmarks for multimodal alignment, translation, fusion, and transference, advancing the state-of-the-art in multimodal machine learning.
Findings
Spatial-Reasoning BERT effectively translates spatial language into visual arrangements.
A new loss function improves medical text to 3D location mapping.
Multimodal knowledge transference enhances egocentric action recognition.
Abstract
This manuscript explores multimodal alignment, translation, fusion, and transference to enhance machine understanding of complex inputs. We organize the work into five chapters, each addressing unique challenges in multimodal machine learning. Chapter 3 introduces Spatial-Reasoning Bert for translating text-based spatial relations into 2D arrangements between clip-arts. This enables effective decoding of spatial language into visual representations, paving the way for automated scene generation aligned with human spatial understanding. Chapter 4 presents a method for translating medical texts into specific 3D locations within an anatomical atlas. We introduce a loss function leveraging spatial co-occurrences of medical terms to create interpretable mappings, significantly enhancing medical text navigability. Chapter 5 tackles translating structured text into canonical facts within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Biomedical Text Mining and Ontologies
