TL;DR
This paper introduces a two-stage framework leveraging linguistic dissimilarity to improve language generalization for unseen low-resource varieties, using a novel source-selection method and a specialized architecture.
Contribution
It proposes a new framework combining variety-specific cues and overlap exploitation, with TOPPing and VACAI-Bowl, to enhance generalization on low-resource language varieties.
Findings
54.62% average improvement in dependency parsing accuracy
Effective on 10 low-resource varieties
Outperforms prior approaches in structural prediction tasks
Abstract
Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework that focuses on capturing variety-specific cues while also exploiting rich overlap offered by high-resource source variety. First, we propose TOPPing, a source-selection method specifically designed for low-resource varieties. Second, we suggest a lightweight VACAI-Bowl architecture that learns variety-specific attributes with one branch while a parallel branch captures variety-invariant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
