Domain-adaptation of spherical embeddings
Mihalis Gongolidis, Jeremy Minton, Ronin Wu, Valentin Stauber, Jason, Hoelscher-Obermaier, Viktor Botev

TL;DR
This paper addresses the challenge of domain adaptation in spherical embedding models, specifically improving the JoSE model for chemistry texts by countering global rotations and proposing effective update strategies, achieving performance comparable to Word2Vec.
Contribution
The authors develop methods to counter global rotations in spherical embeddings and introduce strategies for effective domain-specific updates, enhancing JoSE's adaptation to chemistry texts.
Findings
Proposed methods successfully counter global rotations during training.
Update strategies reduce domain adaptation performance loss to levels similar to Word2Vec.
New chemistry datasets enable effective evaluation of domain adaptation techniques.
Abstract
Domain adaptation of embedding models, updating a generic embedding to the language of a specific domain, is a proven technique for domains that have insufficient data to train an effective model from scratch. Chemistry publications is one such domain, where scientific jargon and overloaded terminology inhibit the performance of a general language model. The recent spherical embedding model (JoSE) proposed in arXiv:1911.01196 jointly learns word and document embeddings during training on the multi-dimensional unit sphere, which performs well for document classification and word correlation tasks. But, we show a non-convergence caused by global rotations during its training prevents it from domain adaptation. In this work, we develop methods to counter the global rotation of the embedding space and propose strategies to update words and documents during domain specific training. Two new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Advanced Text Analysis Techniques
