Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
Qixiang Fang, Dong Nguyen, Daniel L Oberski

TL;DR
This paper evaluates the construct validity of various text embedding models for social science applications, demonstrating their potential and limitations in representing survey questions and predicting responses.
Contribution
It adapts the construct validity framework to high-dimensional text embeddings and compares multiple embedding methods for survey question analysis.
Findings
BERT and Universal Sentence Encoder show higher validity in representing survey questions.
Embeddings can predict responses to new survey questions.
Construct validity varies across embedding methods.
Abstract
Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to supposedly meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are valid representations of constructs relevant for social science research. We therefore propose the use of the classic construct validity framework to evaluate the validity of text embeddings. We show how this framework can be adapted to the opaque and high-dimensional nature of text embeddings, with application to survey questions. We include several popular text embedding methods (e.g. fastText, GloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct validity analyses. We find evidence of convergent and discriminant validity in some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Dropout · Linear Warmup With Linear Decay · Softmax · WordPiece
