Empirical Evaluation of Pre-trained Transformers for Human-Level NLP:   The Role of Sample Size and Dimensionality

Adithya V Ganesan; Matthew Matero; Aravind Reddy Ravula; Huy Vu; H.; Andrew Schwartz

arXiv:2105.03484·cs.CL·June 5, 2023

Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality

Adithya V Ganesan, Matthew Matero, Aravind Reddy Ravula, Huy Vu, H., Andrew Schwartz

PDF

1 Repo

TL;DR

This paper systematically evaluates how dimension reduction techniques and sample size influence the performance of pre-trained transformers in human-level NLP tasks, highlighting the effectiveness of PCA and the potential for reduced embedding dimensions.

Contribution

It introduces a comprehensive analysis of dimension reduction methods and their impact on transformer-based models in low-data human-level NLP tasks, demonstrating practical benefits.

Findings

01

PCA outperforms other reduction methods for longer texts.

02

Pre-trained dimension reduction improves fine-tuning with limited data.

03

Most tasks achieve near-best results with only 1/12 of original embedding dimensions.

Abstract

In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adithya8/ContextualEmbeddingDR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Linear Warmup With Linear Decay · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam