Learning to Distill: The Essence Vector Modeling Framework
Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, and Hsin-Min Wang

TL;DR
This paper introduces the essence vector (EV) model for unsupervised paragraph embedding that distills core information and excludes background noise, with an extension (D-EV) for robust spoken content representation.
Contribution
It proposes a novel EV model for better paragraph embeddings and extends it to D-EV for noisy spoken content, addressing limitations of traditional methods.
Findings
EV produces more informative paragraph vectors
D-EV enhances robustness against speech recognition errors
Outperforms existing embedding methods in experiments
Abstract
In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Text and Document Classification Technologies
