Empirical Gaussian priors for cross-lingual transfer learning
Anders S{\o}gaard

TL;DR
This paper introduces empirical Gaussian priors derived from source language models to improve cross-lingual POS tagging, reducing overfitting and enhancing transfer learning performance.
Contribution
It proposes a novel method of using source language models to estimate Gaussian priors, outperforming traditional regularization and model interpolation techniques.
Findings
Empirical Gaussian priors significantly improve transfer learning accuracy.
Drop-out with Gaussian noise further enhances model robustness.
Lower Rademacher complexity indicates better generalization.
Abstract
Sequence model learning algorithms typically maximize log-likelihood minus the norm of the model (or minimize Hamming loss + norm). In cross-lingual part-of-speech (POS) tagging, our target language training data consists of sequences of sentences with word-by-word labels projected from translations in languages for which we have labeled data, via word alignments. Our training data is therefore very noisy, and if Rademacher complexity is high, learning algorithms are prone to overfit. Norm-based regularization assumes a constant width and zero mean prior. We instead propose to use the source language models to estimate the parameters of a Gaussian prior for learning new POS taggers. This leads to significantly better performance in multi-source transfer set-ups. We also present a drop-out version that injects (empirical) Gaussian noise during online learning. Finally, we note…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
