A Topological Method for Comparing Document Semantics
Yuqi Kong, Fanchao Meng, Benjamin Carterette

TL;DR
This paper introduces a topological persistence-based algorithm for comparing document semantics, demonstrating high human consistency and superior performance over many existing methods in experimental evaluations.
Contribution
It presents a novel topological approach to document similarity, filling a gap in NLP and IR methods that mostly rely on statistical or vector space models.
Findings
The proposed method achieves high alignment with human judgments.
It outperforms most state-of-the-art methods in experiments.
It ties with NLTK in performance.
Abstract
Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges' results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
