Recurrent Neural Network Language Model Adaptation Derived Document   Vector

Wei Li; Brian Kan Wing Mak

arXiv:1611.00196·cs.CL·December 15, 2016

Recurrent Neural Network Language Model Adaptation Derived Document Vector

Wei Li, Brian Kan Wing Mak

PDF

Open Access

TL;DR

This paper introduces a novel document vector representation derived from adapting RNN language models to capture sequential information, improving genre classification performance over traditional methods.

Contribution

It proposes a new document vector method based on adapting RNN and LSTM language models, capturing sequential information ignored by previous models.

Findings

01

DV-LSTM outperforms TF-IDF and PV-DM in genre classification

02

Combining proposed vectors with existing methods further improves accuracy

03

Document vectors effectively encode high-level sequential information

Abstract

In many natural language processing (NLP) tasks, a document is commonly modeled as a bag of words using the term frequency-inverse document frequency (TF-IDF) vector. One major shortcoming of the frequency-based TF-IDF feature vector is that it ignores word orders that carry syntactic and semantic relationships among the words in a document, and they can be important in some NLP tasks such as genre classification. This paper proposes a novel distributed vector representation of a document: a simple recurrent-neural-network language model (RNN-LM) or a long short-term memory RNN language model (LSTM-LM) is first created from all documents in a task; some of the LM parameters are then adapted by each document, and the adapted parameters are vectorized to represent the document. The new document vectors are labeled as DV-RNN and DV-LSTM respectively. We believe that our new document…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Advanced Text Analysis Techniques