# Cleaned Similarity for Better Memory-Based Recommenders

**Authors:** Farhan Khawar, Nevin L. Zhang

arXiv: 1905.07370 · 2019-05-20

## TL;DR

This paper analyzes the spectral properties of similarity estimators in memory-based collaborative filtering, identifies noise issues, and proposes a re-scaling scheme to improve recommendation accuracy.

## Contribution

It introduces a noise cleaning and re-scaling method for similarity estimators, enhancing the performance of memory-based recommender systems.

## Key findings

- Spectral analysis reveals noise and eigenvalue spreading in similarity estimators.
- Cosine similarity exhibits eigenvalue shrinkage but overestimates large eigenvalues.
- Re-scaling improves the accuracy of memory-based collaborative filtering.

## Abstract

Memory-based collaborative filtering methods like user or item k-nearest neighbors (kNN) are a simple yet effective solution to the recommendation problem. The backbone of these methods is the estimation of the empirical similarity between users/items. In this paper, we analyze the spectral properties of the Pearson and the cosine similarity estimators, and we use tools from random matrix theory to argue that they suffer from noise and eigenvalues spreading. We argue that, unlike the Pearson correlation, the cosine similarity naturally possesses the desirable property of eigenvalue shrinkage for large eigenvalues. However, due to its zero-mean assumption, it overestimates the largest eigenvalues. We quantify this overestimation and present a simple re-scaling and noise cleaning scheme. This results in better performance of the memory-based methods compared to their vanilla counterparts.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.07370/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1905.07370/full.md

## References

7 references — full list in the complete paper: https://tomesphere.com/paper/1905.07370/full.md

---
Source: https://tomesphere.com/paper/1905.07370