Explaining Away Syntactic Structure in Semantic Document Representations
Erik Holmer, Andreas Marfurt

TL;DR
This paper introduces a sequence-aware variational autoencoder that explicitly models syntactic structure in documents, improving semantic representation quality and robustness to syntactic noise compared to traditional bag-of-words models.
Contribution
It extends the Neural Variational Document Model to incorporate word order, separating syntactic from semantic information within a variational autoencoder framework.
Findings
Stronger topicality of learned representations
Increased robustness to syntactic noise
Improved semantic content capture
Abstract
Most generative document models act on bag-of-words input in an attempt to focus on the semantic content and thereby partially forego syntactic information. We argue that it is preferable to keep the original word order intact and explicitly account for the syntactic structure instead. We propose an extension to the Neural Variational Document Model (Miao et al., 2016) that does exactly that to separate local (syntactic) context from the global (semantic) representation of the document. Our model builds on the variational autoencoder framework to define a generative document model based on next-word prediction. We name our approach Sequence-Aware Variational Autoencoder since in contrast to its predecessor, it operates on the true input sequence. In a series of experiments we observe stronger topicality of the learned representations as well as increased robustness to syntactic noise in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
MethodsSolana Customer Service Number +1-833-534-1729
