Explaining Away Syntactic Structure in Semantic Document Representations

Erik Holmer; Andreas Marfurt

arXiv:1806.01620·cs.CL·June 6, 2018·1 cites

Explaining Away Syntactic Structure in Semantic Document Representations

Erik Holmer, Andreas Marfurt

PDF

Open Access

TL;DR

This paper introduces a sequence-aware variational autoencoder that explicitly models syntactic structure in documents, improving semantic representation quality and robustness to syntactic noise compared to traditional bag-of-words models.

Contribution

It extends the Neural Variational Document Model to incorporate word order, separating syntactic from semantic information within a variational autoencoder framework.

Findings

01

Stronger topicality of learned representations

02

Increased robustness to syntactic noise

03

Improved semantic content capture

Abstract

Most generative document models act on bag-of-words input in an attempt to focus on the semantic content and thereby partially forego syntactic information. We argue that it is preferable to keep the original word order intact and explicitly account for the syntactic structure instead. We propose an extension to the Neural Variational Document Model (Miao et al., 2016) that does exactly that to separate local (syntactic) context from the global (semantic) representation of the document. Our model builds on the variational autoencoder framework to define a generative document model based on next-word prediction. We name our approach Sequence-Aware Variational Autoencoder since in contrast to its predecessor, it operates on the true input sequence. In a series of experiments we observe stronger topicality of the learned representations as well as increased robustness to syntactic noise in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies

MethodsSolana Customer Service Number +1-833-534-1729