SEMIE: SEMantically Infused Embeddings with Enhanced Interpretability   for Domain-specific Small Corpus

Rishabh Gupta; Rajesh N Rao

arXiv:2103.11431·cs.CL·March 23, 2021

SEMIE: SEMantically Infused Embeddings with Enhanced Interpretability for Domain-specific Small Corpus

Rishabh Gupta, Rajesh N Rao

PDF

Open Access

TL;DR

This paper introduces SEMIE, a method for creating highly interpretable and efficient word embeddings tailored for small, domain-specific corpora, addressing limitations of generic embeddings in specialized fields.

Contribution

The paper proposes a novel approach to generate interpretable embeddings specifically designed for small, domain-specific datasets, enhancing their applicability in specialized NLP tasks.

Findings

01

Embeddings demonstrate improved interpretability in domain contexts

02

Enhanced efficiency of embeddings for small corpora

03

Evaluation shows competitive performance with existing methods

Abstract

Word embeddings are a basic building block of modern NLP pipelines. Efforts have been made to learn rich, efficient, and interpretable embeddings for large generic datasets available in the public domain. However, these embeddings have limited applicability for small corpora from specific domains such as automotive, manufacturing, maintenance and support, etc. In this work, we present a comprehensive notion of interpretability for word embeddings and propose a novel method to generate highly interpretable and efficient embeddings for a domain-specific small corpus. We report the evaluation results of our resulting word embeddings and demonstrate their novel features for enhanced interpretability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications