Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models
Alejo Lopez-Avila, V\'ictor Su\'arez-Paniagua

TL;DR
This paper introduces a three-phase fine-tuning method for Transformer models that combines Denoising Autoencoders, Contrastive Learning, and data augmentation to improve classification performance on NLP tasks.
Contribution
It proposes a novel three-phase approach integrating Denoising Autoencoders and Contrastive Learning with data augmentation for better transfer learning in NLP.
Findings
Enhanced classification accuracy on multiple datasets.
Effective handling of unbalanced datasets with new augmentation.
Improved model adaptation through combined techniques.
Abstract
Recently, using large pretrained Transformer models for transfer learning tasks has evolved to the point where they have become one of the flagship trends in the Natural Language Processing (NLP) community, giving rise to various outlooks such as prompt-based, adapters or combinations with unsupervised approaches, among many others. This work proposes a 3 Phase technique to adjust a base model for a classification task. First, we adapt the model's signal to the data distribution by performing further training with a Denoising Autoencoder (DAE). Second, we adjust the representation space of the output to the corresponding classes by clustering through a Contrastive Learning (CL) method. In addition, we introduce a new data augmentation approach for Supervised Contrastive Learning to correct the unbalanced datasets. Third, we apply fine-tuning to delimit the predefined categories. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image and Signal Denoising Methods
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout
