Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline   Two Decades Later

Han-Jia Ye; Huai-Hong Yin; De-Chuan Zhan; Wei-Lun Chao

arXiv:2407.03257·cs.LG·March 4, 2025·1 cites

Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later

Han-Jia Ye, Huai-Hong Yin, De-Chuan Zhan, Wei-Lun Chao

PDF

Open Access 1 Repo

TL;DR

This paper revisits a classical differentiable K-nearest neighbors method, enhancing it with modern deep learning techniques to establish a strong baseline for tabular data that rivals state-of-the-art models.

Contribution

It demonstrates that a simple, deep learning-enhanced NCA approach can outperform existing deep models and match top tree-based methods on diverse tabular datasets.

Findings

01

Deep NCA achieves performance comparable to CatBoost.

02

Modern training techniques significantly improve NCA results.

03

Deep architectures and loss functions are key to performance gains.

Abstract

The widespread enthusiasm for deep learning has recently expanded into the domain of tabular data. Recognizing that the advancement in deep tabular methods is often inspired by classical methods, e.g., integration of nearest neighbors into neural networks, we investigate whether these classical methods can be revitalized with modern techniques. We revisit a differentiable version of $K$ -nearest neighbors (KNN) -- Neighbourhood Components Analysis (NCA) -- originally designed to learn a linear projection to capture semantic similarities between instances, and seek to gradually add modern deep learning techniques on top. Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data, in contrast to the results of using existing toolboxes like scikit-learn. Further equipping NCA with deep representations and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qile2000/LAMDA-TALENT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUrban Design and Spatial Analysis