High-dimensional Semi-supervised Classification via the Fermat Distance
Ruoxu Tan, Yiming Zang

TL;DR
This paper introduces a high-dimensional semi-supervised classification method using Fermat distance, achieving minimax optimality and exponential error decay, with strong empirical performance.
Contribution
It proposes a novel Fermat distance-based classifier and analyzes its theoretical optimality and error decay, advancing semi-supervised learning in high dimensions.
Findings
Fermat distance-based classifier is minimax optimal.
Error from Fermat distance estimation decays exponentially with sample size.
Experiments show competitive or superior performance to existing methods.
Abstract
Semi-supervised classification, where unlabeled data are massive but labeled data are limited, often arises in machine learning applications. We address this challenge under high-dimensional data by leveraging the manifold and cluster assumptions. Based on the Fermat distance, a density-sensitive metric that naturally encodes the cluster assumption, we propose the weighted -nearest neighbors (NN) classifier and multidimensional scaling (MDS)-induced classifiers. The use of MDS with a large target dimension allows the effective application of linear classifiers to complex manifold data. Theoretically, we derive a sharp lower bound for the expected excess risk within clusters and prove that the weighted -NN classifier utilizing the true Fermat distance is minimax optimal. Furthermore, we explicitly quantify the utility of unlabeled data by showing that the error arising from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
