TL;DR
This paper systematically evaluates how dimension reduction techniques and sample size influence the performance of pre-trained transformers in human-level NLP tasks, highlighting the effectiveness of PCA and the potential for reduced embedding dimensions.
Contribution
It introduces a comprehensive analysis of dimension reduction methods and their impact on transformer-based models in low-data human-level NLP tasks, demonstrating practical benefits.
Findings
PCA outperforms other reduction methods for longer texts.
Pre-trained dimension reduction improves fine-tuning with limited data.
Most tasks achieve near-best results with only 1/12 of original embedding dimensions.
Abstract
In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Linear Warmup With Linear Decay · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam
