Training BERT Models to Carry Over a Coding System Developed on One Corpus to Another
Dalma Galambos, P\'al Zs\'amboki

TL;DR
This study trains BERT models to transfer a literary translation perception coding system from one Hungarian corpus to another, addressing domain shift and annotation consistency through extensive tuning and ensemble methods.
Contribution
It demonstrates effective transfer of a coding system across domains using BERT, with techniques to handle label imbalance and domain shift, including ensemble learning and domain adaptation.
Findings
Models successfully transfer coding system across domains.
Learning multilabel correlations improves domain shift resistance.
Domain adaptation on OCR text nearly matches performance on original corpus.
Abstract
This paper describes how we train BERT models to carry over a coding system developed on the paragraphs of a Hungarian literary journal to another. The aim of the coding system is to track trends in the perception of literary translation around the political transformation in 1989 in Hungary. To evaluate not only task performance but also the consistence of the annotation, moreover, to get better predictions from an ensemble, we use 10-fold crossvalidation. Extensive hyperparameter tuning is used to obtain the best possible results and fair comparisons. To handle label imbalance, we use loss functions and metrics robust to it. Evaluation of the effect of domain shift is carried out by sampling a test set from the target domain. We establish the sample size by estimating the bootstrapped confidence interval via simulations. This way, we show that our models can carry over one annotation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTranslation Studies and Practices · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · WordPiece · Multi-Head Attention · Weight Decay
