Massively Multilingual Language Models for Cross Lingual Fact Extraction   from Low Resource Indian Languages

Bhavyajeet Singh; Pavan Kandru; Anubhav Sharma; Vasudeva Varma

arXiv:2302.04790·cs.CL·February 10, 2023·1 cites

Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages

Bhavyajeet Singh, Pavan Kandru, Anubhav Sharma, Vasudeva Varma

PDF

Open Access 1 Repo

TL;DR

This paper introduces Cross Lingual Fact Extraction (CLFE) from low-resource Indian languages, proposing an end-to-end generative method that effectively extracts factual triples with a notable F1 score, enhancing multilingual knowledge graph enrichment.

Contribution

It defines the CLFE task and presents a novel generative approach that significantly improves fact extraction from low-resource languages.

Findings

01

Achieved an overall F1 score of 77.46 on CLFE task.

02

Demonstrated effectiveness of generative models for cross-lingual fact extraction.

03

Addresses gap in multilingual information extraction for low-resource languages.

Abstract

Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bhavyajeet/clfe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Data Quality and Management