Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages
Sunisth Kumar, Davide Liu, Alexandre Boulenger

TL;DR
This paper introduces an efficient cross-lingual NER framework that leverages knowledge distillation and consistency training to transfer entity recognition capabilities from English to low-resource languages like Arabic using minimal labeled data.
Contribution
It presents a novel combination of knowledge distillation and unsupervised consistency training for cross-lingual NER in semi-structured data, effective with very few labeled samples.
Findings
Model performs well with only 30 labeled samples in the target language.
Outperforms state-of-the-art approaches like DistilBERT and supervised models trained on target language.
Enables entity recognition transfer from English to Arabic in semi-structured banking data.
Abstract
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data. Our approach relies on both knowledge distillation and consistency training. The modeling framework leverages knowledge from a large language model (XLMRoBERTa) pre-trained on the source language, with a student-teacher relationship (knowledge distillation). The student model incorporates unsupervised consistency training (with KL divergence loss) on the low-resource target language. We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information, and focus on exhibiting the transfer of knowledge from English to Arabic. With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic. We show that our modeling approach, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Islamic Finance and Banking Studies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Weight Decay · Residual Connection
