Representation Learning for Efficient and Effective Similarity Search   and Recommendation

Casper Hansen

arXiv:2109.01815·cs.IR·September 7, 2021

Representation Learning for Efficient and Effective Similarity Search and Recommendation

Casper Hansen

PDF

Open Access

TL;DR

This paper explores advanced representation learning techniques to generate compact, semantically rich hash codes that enhance the effectiveness and efficiency of similarity search and recommendation systems.

Contribution

It introduces novel methods that improve hash code expressiveness and optimize their structure for better search performance, surpassing current autoencoder-based approaches.

Findings

01

Enhanced hash codes improve search accuracy.

02

New similarity measures outperform Hamming distance.

03

Empirical validation on multiple tasks confirms effectiveness.

Abstract

How data is represented and operationalized is critical for building computational solutions that are both effective and efficient. A common approach is to represent data objects as binary vectors, denoted \textit{hash codes}, which require little storage and enable efficient similarity search through direct indexing into a hash table or through similarity computations in an appropriate space. Due to the limited expressibility of hash codes, compared to real-valued representations, a core open challenge is how to generate hash codes that well capture semantic content or latent properties using a small number of bits, while ensuring that the hash codes are distributed in a way that does not reduce their search efficiency. State of the art methods use representation learning for generating such hash codes, focusing on neural autoencoder architectures where semantics are encoded into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications