Learning the Information Divergence
Onur Dikmen, Zhirong Yang, Erkki Oja

TL;DR
This paper introduces a framework for automatically selecting the most suitable information divergence for machine learning tasks by reformulating divergence families and applying maximum likelihood estimation, improving divergence choice accuracy.
Contribution
It proposes a novel approach to automatically select optimal divergences among families using maximum likelihood, including reformulations and connections between divergence types.
Findings
Accurately selects divergences across various tasks
Demonstrates effectiveness on synthetic and real data
Extends framework to non-separable divergences
Abstract
Information divergence that measures the difference between two nonnegative matrices or tensors has found its use in a variety of machine learning problems. Examples are Nonnegative Matrix/Tensor Factorization, Stochastic Neighbor Embedding, topic models, and Bayesian network optimization. The success of such a learning task depends heavily on a suitable divergence. A large variety of divergences have been suggested and analyzed, but very few results are available for an objective choice of the optimal divergence for a given task. Here we present a framework that facilitates automatic selection of the best divergence among a given family, based on standard maximum likelihood estimation. We first propose an approximated Tweedie distribution for the beta-divergence family. Selecting the best beta then becomes a machine learning problem solved by maximum likelihood. Next, we reformulate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Gaussian Processes and Bayesian Inference · Machine Learning and Data Classification
