Domain-Specific Word Embeddings with Structure Prediction
Stephanie Brandl, David Lassner, Anne Baillot, Shinichi, Nakajima

TL;DR
This paper introduces W2VPred, a novel word embedding method that simultaneously captures general, domain-specific, and structural information, enabling dynamic and aligned embeddings across different corpora and domains.
Contribution
The paper presents a new embedding approach that models structure between sub-corpora and domains, outperforming baselines in analogy and structure prediction tasks.
Findings
W2VPred outperforms baselines in analogy tests.
It effectively predicts structure without prior information.
Demonstrated usefulness in Digital Humanities research.
Abstract
Complementary to finding good general word embeddings, an important question for representation learning is to find dynamic word embeddings, e.g., across time or domain. Current methods do not offer a way to use or predict information on structure between sub-corpora, time or domain and dynamic embeddings can only be compared after post-alignment. We propose novel word embedding methods that provide general word representations for the whole corpus, domain-specific representations for each sub-corpus, sub-corpus structure, and embedding alignment simultaneously. We present an empirical evaluation on New York Times articles and two English Wikipedia datasets with articles on science and philosophy. Our method, called Word2Vec with Structure Prediction (W2VPred), provides better performance than baselines in terms of the general analogy tests, domain-specific analogy tests, and multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Wikis in Education and Collaboration
