# Does the Geometry of Word Embeddings Help Document Classification? A   Case Study on Persistent Homology Based Representations

**Authors:** Paul Michel, Abhilasha Ravichander, Shruti Rijhwani

arXiv: 1705.10900 · 2017-06-01

## TL;DR

This paper explores whether algebraic topology methods, specifically persistent homology, improve document classification by capturing geometric features, but finds they do not outperform simple baseline techniques like tf-idf.

## Contribution

The study applies persistent homology-based representations to NLP tasks, providing a rigorous evaluation of their effectiveness in text analysis.

## Key findings

- Topology-based embeddings do not improve classification performance.
- Performance is worse than tf-idf in the evaluated tasks.
- Document geometry does not significantly aid topic or sentiment classification.

## Abstract

We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document representations in traditional NLP tasks, specifically document clustering and sentiment classification. We find that the embeddings do not benefit text analysis. In fact, performance is worse than simple techniques like $\textit{tf-idf}$, indicating that the geometry of the document does not provide enough variability for classification on the basis of topic or sentiment in the chosen datasets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.10900/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1705.10900/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1705.10900/full.md

---
Source: https://tomesphere.com/paper/1705.10900