Doubly robust nearest neighbors in factor models

Raaz Dwivedi; Katherine Tian; Sabina Tomkins; Predrag Klasnja; Susan; Murphy; Devavrat Shah

arXiv:2211.14297·stat.ML·January 31, 2024

Doubly robust nearest neighbors in factor models

Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan, Murphy, Devavrat Shah

PDF

Open Access 1 Repo

TL;DR

This paper proposes a doubly robust nearest neighbors method for matrix completion in latent factor models, improving estimation accuracy especially when similar rows or columns are scarce, and providing better confidence intervals.

Contribution

It introduces a novel doubly robust NN estimator that leverages either row or column neighbors, enhancing performance over traditional methods in missing data scenarios.

Findings

01

Provides consistent estimates with either good row or column neighbors.

02

Achieves near-quadratic error reduction when both neighbors are available.

03

Offers significantly narrower confidence intervals compared to existing NN strategies.

Abstract

We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$ -th entry, when observed, is given by its mean $f (u_{i}, v_{t})$ plus mean-zero noise for an unknown function $f$ and latent factors $u_{i}$ and $v_{t}$ . Prior NN strategies, like unit-unit NN, for estimating the mean $f (u_{i}, v_{t})$ relies on existence of other rows $j$ with $u_{j} \approx u_{i}$ . Similarly, time-time NN strategy relies on existence of columns $t^{'}$ with $v_{t^{'}} \approx v_{t}$ . These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aashish-khub/NearestNeighbors
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Multi-Criteria Decision Making