Universal Cross-Lingual Text Classification

Riya Savant; Anushka Shelke; Sakshi Todmal; Sanskruti Kanphade; Ananya; Joshi; Raviraj Joshi

arXiv:2406.11028·cs.CL·June 18, 2024

Universal Cross-Lingual Text Classification

Riya Savant, Anushka Shelke, Sakshi Todmal, Sanskruti Kanphade, Ananya, Joshi, Raviraj Joshi

PDF

TL;DR

This paper introduces a universal cross-lingual text classification model that leverages multilingual data and a strong SBERT base to improve label and language coverage, especially for low-resource languages.

Contribution

It proposes a novel training strategy using multilingual data blending with SBERT to create an adaptable, universal classifier across multiple languages.

Findings

01

Enhanced label coverage across languages

02

Improved classification accuracy in low-resource languages

03

Effective cross-lingual transfer demonstrated

Abstract

Text classification, an integral task in natural language processing, involves the automatic categorization of text into predefined classes. Creating supervised labeled datasets for low-resource languages poses a considerable challenge. Unlocking the language potential of low-resource languages requires robust datasets with supervised labels. However, such datasets are scarce, and the label space is often limited. In our pursuit to address this gap, we aim to optimize existing labels/datasets in different languages. This research proposes a novel perspective on Universal Cross-Lingual Text Classification, leveraging a unified model across languages. Our approach involves blending supervised data from different languages during training to create a universal model. The supervised data for a target classification task might come from different languages covering different labels. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training · Balanced Selection · Focus · Sentence-BERT