Multilingual Fact Linking
Keshav Kolluru, Martin Rezk, Pat Verga, William W. Cohen, Partha, Talukdar

TL;DR
This paper introduces Multilingual Fact Linking, a new task for connecting sentences in various languages to facts in a Knowledge Graph, and presents a dataset and a retrieval-generation model that improves linking accuracy.
Contribution
The paper proposes the novel task of Multilingual Fact Linking, creates the IndicLink dataset, and introduces the ReFCoG model combining retrieval and generation for scalable fact linking.
Findings
ReFCoG outperforms standard models by 10.7 points in Precision@1.
IndicLink dataset contains over 11,000 linked facts and 6,400 sentences across multiple languages.
ReFCoG achieves an overall score of 52.1, indicating room for further improvement.
Abstract
Knowledge-intensive NLP tasks can benefit from linking natural language text with facts from a Knowledge Graph (KG). Although facts themselves are language-agnostic, the fact labels (i.e., language-specific representation of the fact) in the KG are often present only in a few languages. This makes it challenging to link KG facts to sentences in languages other than the limited set of languages. To address this problem, we introduce the task of Multilingual Fact Linking (MFL) where the goal is to link fact expressed in a sentence to corresponding fact in the KG, even when the fact label in the KG is not available in the language of the sentence. To facilitate research in this area, we present a new evaluation dataset, IndicLink. This dataset contains 11,293 linked WikiData facts and 6,429 sentences spanning English and six Indian languages. We propose a Retrieval+Generation model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
