Fine-Tuning Small Embeddings for Elevated Performance
Biraj Silwal

TL;DR
This paper demonstrates that fine-tuning small, resource-efficient BERT embeddings significantly improves NLP performance for low-resource languages like Nepali, making advanced language models more accessible.
Contribution
It introduces a method for fine-tuning small BERT embeddings on low-resource languages, showing substantial performance gains over baseline models.
Findings
Fine-tuning small embeddings improves Nepali NLP tasks.
Small embeddings outperform original baselines after fine-tuning.
Results approach those of larger, pretrained models.
Abstract
Contextual Embeddings have yielded state-of-the-art results in various natural language processing tasks. However, these embeddings are constrained by models requiring large amounts of data and huge computing power. This is an issue for low-resource languages like Nepali as the amount of data available over the internet is not always sufficient for the models. This work has taken an incomplete BERT model with six attention heads pretrained on Nepali language and finetuned it on previously unseen data. The obtained results from intrinsic and extrinsic evaluations have been compared to the results drawn from the original model baseline and a complete BERT model pretrained on Nepali language as the oracle. The results demonstrate that even though the oracle is better on average, finetuning the small embeddings drastically improves results compared to the original baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvancements in Semiconductor Devices and Circuit Design
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Adam · Residual Connection · Weight Decay · Softmax · Multi-Head Attention
