# Encouraging Paragraph Embeddings to Remember Sentence Identity Improves   Classification

**Authors:** Tu Vu, Mohit Iyyer

arXiv: 1906.03656 · 2019-06-11

## TL;DR

This paper enhances paragraph embeddings by encouraging them to remember sentence identities, leading to improved classification accuracy, faster training, and better generalization.

## Contribution

It introduces a new objective that emphasizes sentence content, improving upon existing paragraph embedding methods in semi-supervised learning.

## Key findings

- Improved downstream classification accuracy
- Faster training times
- Enhanced generalization ability

## Abstract

While paragraph embedding models are remarkably effective for downstream classification tasks, what they learn and encode into a single vector remains opaque. In this paper, we investigate a state-of-the-art paragraph embedding method proposed by Zhang et al. (2017) and discover that it cannot reliably tell whether a given sentence occurs in the input paragraph or not. We formulate a sentence content task to probe for this basic linguistic property and find that even a much simpler bag-of-words method has no trouble solving it. This result motivates us to replace the reconstruction-based objective of Zhang et al. (2017) with our sentence content probe objective in a semi-supervised setting. Despite its simplicity, our objective improves over paragraph reconstruction in terms of (1) downstream classification accuracies on benchmark datasets, (2) faster training, and (3) better generalization ability.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.03656/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1906.03656/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1906.03656/full.md

---
Source: https://tomesphere.com/paper/1906.03656