On Interpretable Approaches to Cluster, Classify and Represent   Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion   Theory

Kai-Liang Lu; Avraham Chapman

arXiv:2302.10383·cs.CV·June 21, 2023

On Interpretable Approaches to Cluster, Classify and Represent Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion Theory

Kai-Liang Lu, Avraham Chapman

PDF

Open Access

TL;DR

This paper presents three interpretable, information-theoretic methods for clustering, classifying, and representing high-dimensional data based on rate-distortion theory, suitable for finite-sample, mixed Gaussian data.

Contribution

It introduces novel, theoretically grounded algorithms for clustering, classification, and representation using lossy coding length criteria derived from rate-distortion theory.

Findings

01

Effective for finite-sample, sparse, or degenerate data

02

Suitable for mixed Gaussian distributions or subspaces

03

Provides a theoretical guide for white-box machine learning

Abstract

To cluster, classify and represent are three fundamental objectives of learning from high-dimensional data with intrinsic structure. To this end, this paper introduces three interpretable approaches, i.e., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion. These are derived based on the lossy data coding and compression framework from the principle of rate distortion in information theory. These algorithms are particularly suitable for dealing with finite-sample data (allowed to be sparse or almost degenerate) of mixed Gaussian distributions or subspaces. The theoretical value and attractive features of these methods are summarized by comparison with other learning methods or evaluation criteria. This summary note aims to provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Distributed Sensor Networks and Detection Algorithms · Bayesian Methods and Mixture Models