TL;DR
This paper introduces multimodal knowledge base embeddings that incorporate text, images, and numerical data using neural encoders, enabling improved link prediction and the generation of missing multimodal information, with new benchmarks and state-of-the-art results.
Contribution
It proposes a novel multimodal embedding framework combining various neural encoders with relational models, and introduces a multimodal imputation model for missing data generation.
Findings
Achieved 5-7% improvement in link prediction accuracy over existing methods.
Created new benchmarks with textual and visual data for knowledge bases.
Demonstrated effective generation of missing multimodal data through user studies.
Abstract
Representing entities and relations in an embedding space is a well-studied approach for machine learning on relational data. Existing approaches, however, primarily focus on simple link structure between a finite set of entities, ignoring the variety of data types that are often used in knowledge bases, such as text, images, and numerical values. In this paper, we propose multimodal knowledge base embeddings (MKBE) that use different neural encoders for this variety of observed data, and combine them with existing relational models to learn embeddings of the entities and multimodal data. Further, using these learned embedings and different neural decoders, we introduce a novel multimodal imputation model to generate missing multimodal values, like text and images, from information in the knowledge base. We enrich existing relational datasets to create two novel benchmarks that contain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
