# Hamming Sentence Embeddings for Information Retrieval

**Authors:** Felix Hamann, Nadja Kurz, Adrian Ulges

arXiv: 1908.05541 · 2019-08-16

## TL;DR

This paper introduces a neural encoder-decoder method to compress sentence embeddings into Hamming space, enabling efficient retrieval with minimal performance loss at high compression ratios.

## Contribution

It presents a novel neural compression technique that produces binary Hamming embeddings for sentences, maintaining semantic similarity performance at high compression ratios.

## Key findings

- Comparable performance to uncompressed embeddings at 256:1 compression ratio
- Strong feature decorrelation and good generalization from Wikipedia training
- Source code and experimental results publicly available

## Abstract

In retrieval applications, binary hashes are known to offer significant improvements in terms of both memory and speed. We investigate the compression of sentence embeddings using a neural encoder-decoder architecture, which is trained by minimizing reconstruction error. Instead of employing the original real-valued embeddings, we use latent representations in Hamming space produced by the encoder for similarity calculations.   In quantitative experiments on several benchmarks for semantic similarity tasks, we show that our compressed hamming embeddings yield a comparable performance to uncompressed embeddings (Sent2Vec, InferSent, Glove-BoW), at compression ratios of up to 256:1. We further demonstrate that our model strongly decorrelates input features, and that the compressor generalizes well when pre-trained on Wikipedia sentences. We publish the source code on Github and all experimental results.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.05541/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1908.05541/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1908.05541/full.md

---
Source: https://tomesphere.com/paper/1908.05541