Rethinking the adaptive relationship between Encoder Layers and Decoder Layers
Yubo Song

TL;DR
This paper investigates the adaptive relationship between encoder and decoder layers in neural translation models, revealing that structural modifications are more effective when retraining from scratch rather than fine-tuning.
Contribution
It introduces a bias-free fully connected layer between encoder and decoder and compares fine-tuning versus retraining, highlighting the importance of training strategy for structural adjustments.
Findings
Structural modifications perform better with retraining.
Fine-tuning yields suboptimal performance after structural changes.
Retraining enhances the effectiveness of encoder-decoder structural adjustments.
Abstract
This article explores the adaptive relationship between Encoder Layers and Decoder Layers using the SOTA model Helsinki-NLP/opus-mt-de-en, which translates German to English. The specific method involves introducing a bias-free fully connected layer between the Encoder and Decoder, with different initializations of the layer's weights, and observing the outcomes of fine-tuning versus retraining. Four experiments were conducted in total. The results suggest that directly modifying the pre-trained model structure for fine-tuning yields suboptimal performance. However, upon observing the outcomes of the experiments with retraining, this structural adjustment shows significant potential.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications
