Universal Cross-Lingual Text Classification
Riya Savant, Anushka Shelke, Sakshi Todmal, Sanskruti Kanphade, Ananya, Joshi, Raviraj Joshi

TL;DR
This paper introduces a universal cross-lingual text classification model that leverages multilingual data and a strong SBERT base to improve label and language coverage, especially for low-resource languages.
Contribution
It proposes a novel training strategy using multilingual data blending with SBERT to create an adaptable, universal classifier across multiple languages.
Findings
Enhanced label coverage across languages
Improved classification accuracy in low-resource languages
Effective cross-lingual transfer demonstrated
Abstract
Text classification, an integral task in natural language processing, involves the automatic categorization of text into predefined classes. Creating supervised labeled datasets for low-resource languages poses a considerable challenge. Unlocking the language potential of low-resource languages requires robust datasets with supervised labels. However, such datasets are scarce, and the label space is often limited. In our pursuit to address this gap, we aim to optimize existing labels/datasets in different languages. This research proposes a novel perspective on Universal Cross-Lingual Text Classification, leveraging a unified model across languages. Our approach involves blending supervised data from different languages during training to create a universal model. The supervised data for a target classification task might come from different languages covering different labels. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training · Balanced Selection · Focus · Sentence-BERT
