Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages
Bhavyajeet Singh, Pavan Kandru, Anubhav Sharma, Vasudeva Varma

TL;DR
This paper introduces Cross Lingual Fact Extraction (CLFE) from low-resource Indian languages, proposing an end-to-end generative method that effectively extracts factual triples with a notable F1 score, enhancing multilingual knowledge graph enrichment.
Contribution
It defines the CLFE task and presents a novel generative approach that significantly improves fact extraction from low-resource languages.
Findings
Achieved an overall F1 score of 77.46 on CLFE task.
Demonstrated effectiveness of generative models for cross-lingual fact extraction.
Addresses gap in multilingual information extraction for low-resource languages.
Abstract
Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Data Quality and Management
