# A Sampling Theory Perspective of Graph-based Semi-supervised Learning

**Authors:** Aamir Anis, Aly El Gamal, Salman Avestimehr, Antonio Ortega

arXiv: 1705.09518 · 2019-04-16

## TL;DR

This paper offers a theoretical framework for understanding graph-based semi-supervised learning by modeling class indicators as bandlimited signals and analyzing their bandwidth in relation to dataset geometry.

## Contribution

It introduces a sampling theory perspective, justifying the bandlimitedness assumption of class indicators in semi-supervised learning.

## Key findings

- Bandwidth of class indicators relates to dataset geometry
- The approach applies to general data models with separable and nonseparable classes
- Provides a theoretical basis for graph-based semi-supervised classification

## Abstract

Graph-based methods have been quite successful in solving unsupervised and semi-supervised learning problems, as they provide a means to capture the underlying geometry of the dataset. It is often desirable for the constructed graph to satisfy two properties: first, data points that are similar in the feature space should be strongly connected on the graph, and second, the class label information should vary smoothly with respect to the graph, where smoothness is measured using the spectral properties of the graph Laplacian matrix. Recent works have justified some of these smoothness conditions by showing that they are strongly linked to the semi-supervised smoothness assumption and its variants. In this work, we reinforce this connection by viewing the problem from a graph sampling theoretic perspective, where class indicator functions are treated as bandlimited graph signals (in the eigenvector basis of the graph Laplacian) and label prediction as a bandlimited reconstruction problem. Our approach involves analyzing the bandwidth of class indicator signals generated from statistical data models with separable and nonseparable classes. These models are quite general and mimic the nature of most real-world datasets. Our results show that in the asymptotic limit, the bandwidth of any class indicator is also closely related to the geometry of the dataset. This allows one to theoretically justify the assumption of bandlimitedness of class indicator signals, thereby providing a sampling theoretic interpretation of graph-based semi-supervised classification.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.09518/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/1705.09518/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1705.09518/full.md

---
Source: https://tomesphere.com/paper/1705.09518