TL;DR
This paper introduces SCDV+BERT(ctxd), an unsupervised document representation method that combines contextualized BERT embeddings with soft clustering, outperforming previous models on classification, concept matching, and sentence similarity tasks, especially with limited data.
Contribution
The paper proposes SCDV+BERT(ctxd), a novel unsupervised document embedding technique that integrates contextualized BERT embeddings with soft clustering to better handle polysemy and context.
Findings
Outperforms original SCDV and pre-trained BERT on multiple classification datasets.
Effective in concept matching and sentence similarity tasks.
Excels in low-data and few-shot learning scenarios.
Abstract
Several NLP tasks need the effective representation of text documents. Arora et. al., 2017 demonstrate that simple weighted averaging of word vectors frequently outperforms neural models. SCDV (Mekala et. al., 2017) further extends this from sentences to documents by employing soft and sparse clustering over pre-computed word vectors. However, both techniques ignore the polysemy and contextual character of words. In this paper, we address this issue by proposing SCDV+BERT(ctxd), a simple and effective unsupervised representation that combines contextualized BERT (Devlin et al., 2019) based word embedding for word sense disambiguation with SCDV soft clustering approach. We show that our embeddings outperform original SCDV, pre-train BERT, and several other baselines on many classification datasets. We also demonstrate our embeddings effectiveness on other tasks, such as concept matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · WordPiece · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay · Dropout · Attention Dropout
