Group-Sparse Matrix Factorization for Transfer Learning of Word   Embeddings

Kan Xu; Xuanyi Zhao; Hamsa Bastani; Osbert Bastani

arXiv:2104.08928·stat.ML·February 20, 2024·1 cites

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani

PDF

Open Access 1 Video

TL;DR

This paper introduces a group-sparse matrix factorization method for transfer learning of word embeddings, enabling efficient adaptation to new domains with limited data by exploiting the assumption that only a small subset of embeddings change.

Contribution

It proposes a novel two-stage estimator with theoretical error bounds and computational guarantees for transfer learning of word embeddings using group sparsity.

Findings

01

Achieves high accuracy with less domain-specific data

02

All local minima are statistically equivalent to the global minimum

03

Outperforms state-of-the-art fine-tuning methods

Abstract

Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retail to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word ``positive'' typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient tested positive for a disease. In practice, we expect that only a small number of domain-specific words may have new meanings. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings· slideslive

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Computational and Text Analysis Methods