# Document Similarity for Texts of Varying Lengths via Hidden Topics

**Authors:** Hongyu Gong, Tarek Sakakini, Suma Bhat, Jinjun Xiong

arXiv: 1903.10675 · 2019-03-27

## TL;DR

This paper introduces a novel method for measuring similarity between texts of different lengths by comparing hidden topics, effectively bridging lexical and contextual gaps, and demonstrating superior performance over existing methods.

## Contribution

The paper proposes a new document matching approach using hidden topics to compare texts of varying lengths, improving accuracy over traditional methods.

## Key findings

- Outperforms strong baseline methods in matching tasks
- Effectively bridges lexical and contextual gaps between texts
- Incorporating domain knowledge enhances matching accuracy

## Abstract

Measuring similarity between texts is an important task for several applications. Available approaches to measure document similarity are inadequate for document pairs that have non-comparable lengths, such as a long document and its summary. This is because of the lexical, contextual and the abstraction gaps between a long document of rich details and its concise summary of abstract information. In this paper, we present a document matching approach to bridge this gap, by comparing the texts in a common space of hidden topics. We evaluate the matching algorithm on two matching tasks and find that it consistently and widely outperforms strong baselines. We also highlight the benefits of incorporating domain knowledge to text matching.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.10675/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1903.10675/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1903.10675/full.md

---
Source: https://tomesphere.com/paper/1903.10675