Kernel Subspace and Feature Extraction
Xiangxiang Xu, Lizhong Zheng

TL;DR
This paper explores kernel methods through the lens of feature subspaces, introducing a new maximal correlation kernel with optimal information-theoretic properties and demonstrating its effectiveness in SVMs.
Contribution
It establishes a correspondence between kernels and feature subspaces, introduces the maximal correlation kernel, and links Fisher kernels to optimal maximal correlation kernels.
Findings
Maximal correlation kernel achieves information-theoretic optimality.
Kernel SVM with maximal correlation kernel minimizes prediction error.
Fisher kernel is shown to be a special case of maximal correlation kernel.
Abstract
We study kernel methods in machine learning from the perspective of feature subspace. We establish a one-to-one correspondence between feature subspaces and kernels and propose an information-theoretic measure for kernels. In particular, we construct a kernel from Hirschfeld--Gebelein--R\'{e}nyi maximal correlation functions, coined the maximal correlation kernel, and demonstrate its information-theoretic optimality. We use the support vector machine (SVM) as an example to illustrate a connection between kernel methods and feature extraction approaches. We show that the kernel SVM on maximal correlation kernel achieves minimum prediction error. Finally, we interpret the Fisher kernel as a special maximal correlation kernel and establish its optimality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition · Machine Learning and ELM
MethodsSupport Vector Machine
