A Channel Coding Perspective of Collaborative Filtering

S. T. Aditya; Onkar Dabeer; Bikash Kumar Dey

arXiv:0908.2494·cs.IT·November 17, 2016

A Channel Coding Perspective of Collaborative Filtering

S. T. Aditya, Onkar Dabeer, Bikash Kumar Dey

PDF

TL;DR

This paper models collaborative filtering as a channel coding problem, establishing sharp thresholds for when the underlying rating matrix can be reliably recovered based on cluster sizes and noisy observations.

Contribution

It introduces a novel channel coding perspective for collaborative filtering and derives precise thresholds for matrix recoverability depending on cluster sizes.

Findings

01

Recovery impossible if largest cluster size < C1 log(mn)

02

Polynomial time estimator achieves low error if smallest cluster size > C2 log(mn)

03

Exact threshold constants identified for uniform cluster sizes

Abstract

We consider the problem of collaborative filtering from a channel coding perspective. We model the underlying rating matrix as a finite alphabet matrix with block constant structure. The observations are obtained from this underlying matrix through a discrete memoryless channel with a noisy part representing noisy user behavior and an erasure part representing missing data. Moreover, the clusters over which the underlying matrix is constant are {\it unknown}. We establish a sharp threshold result for this model: if the largest cluster size is smaller than $C_{1} lo g (mn)$ (where the rating matrix is of size $m \times n$ ), then the underlying matrix cannot be recovered with any estimator, but if the smallest cluster size is larger than $C_{2} lo g (mn)$ , then we show a polynomial time estimator with diminishing probability of error. In the case of uniform cluster size, not only the order of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.