Learning to Distill: The Essence Vector Modeling Framework

Kuan-Yu Chen; Shih-Hung Liu; Berlin Chen; and Hsin-Min Wang

arXiv:1611.07206·cs.CL·November 23, 2016·2 cites

Learning to Distill: The Essence Vector Modeling Framework

Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, and Hsin-Min Wang

PDF

Open Access

TL;DR

This paper introduces the essence vector (EV) model for unsupervised paragraph embedding that distills core information and excludes background noise, with an extension (D-EV) for robust spoken content representation.

Contribution

It proposes a novel EV model for better paragraph embeddings and extends it to D-EV for noisy spoken content, addressing limitations of traditional methods.

Findings

01

EV produces more informative paragraph vectors

02

D-EV enhances robustness against speech recognition errors

03

Outperforms existing embedding methods in experiments

Abstract

In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Text and Document Classification Technologies