Fusing Vector Space Models for Domain-Specific Applications

Laura Rettig; Julien Audiffren; Philippe Cudr\'e-Mauroux

arXiv:1909.02307·cs.CL·September 6, 2019

Fusing Vector Space Models for Domain-Specific Applications

Laura Rettig, Julien Audiffren, Philippe Cudr\'e-Mauroux

PDF

TL;DR

This paper introduces a method to automatically combine multiple domain-specific word embeddings, enhancing their expressiveness and improving machine learning performance in domain-specific applications.

Contribution

A novel approach that automatically selects and fuses multiple domain-specific embeddings using a ranking function and dimensionality reduction.

Findings

01

Improved performance on multiple domain-specific tasks.

02

Effective selection of relevant embeddings for specific domains.

03

Enhanced embedding compactness without losing expressiveness.

Abstract

We address the problem of tuning word embeddings for specific use cases and domains. We propose a new method that automatically combines multiple domain-specific embeddings, selected from a wide range of pre-trained domain-specific embeddings, to improve their combined expressive power. Our approach relies on two key components: 1) a ranking function, based on a new embedding similarity measure, that selects the most relevant embeddings to use given a domain and 2) a dimensionality reduction method that combines the selected embeddings to produce a more compact and efficient encoding that preserves the expressiveness. We empirically show that our method produces effective domain-specific embeddings that consistently improve the performance of state-of-the-art machine learning algorithms on multiple tasks, compared to generic embeddings trained on large text corpora.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.