# Learning Word Embeddings with Domain Awareness

**Authors:** Guoyin Wang, Yan Song, Yue Zhang, Dong Yu

arXiv: 1906.03249 · 2019-06-24

## TL;DR

This paper introduces two novel domain-aware mechanisms for training word embeddings, enhancing their performance across heterogeneous domains by integrating domain knowledge into existing models.

## Contribution

It proposes domain indicator and domain attention mechanisms that incorporate domain-specific information into SG and CBOW models, respectively.

## Key findings

- Improved embedding quality in heterogeneous domains
- Enhanced performance in near-cold-start scenarios
- Validated effectiveness through qualitative and quantitative evaluation

## Abstract

Word embeddings are traditionally trained on a large corpus in an unsupervised setting, with no specific design for incorporating domain knowledge. This can lead to unsatisfactory performances when training data originate from heterogeneous domains. In this paper, we propose two novel mechanisms for domain-aware word embedding training, namely domain indicator and domain attention, which integrate domain-specific knowledge into the widely used SG and CBOW models, respectively. The two methods are based on a joint learning paradigm and ensure that words in a target domain are intensively focused when trained on a source domain corpus. Qualitative and quantitative evaluation confirm the validity and effectiveness of our models. Compared to baseline methods, our method is particularly effective in near-cold-start scenarios.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.03249/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1906.03249/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1906.03249/full.md

---
Source: https://tomesphere.com/paper/1906.03249