How to select an objective function using information theory
Timothy O. Hodson, Thomas M. Over, Tyler J. Smith, Lucy M. Marshall

TL;DR
This paper advocates selecting objective functions in machine learning based on information theory principles, aiming to maximize information and minimize uncertainty by transforming objectives into likelihoods.
Contribution
It introduces an information-theoretic framework for choosing objective functions by evaluating their likelihoods and bit-length differences, applicable to models with multiple uses.
Findings
Objective functions can be compared using likelihood ratios.
Transforming objectives into likelihoods reveals their information content.
Maximizing information leads to better model performance in uncertain environments.
Abstract
In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the objective function that represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitude represents how strongly we should prefer one objective versus another, and the log of that relation represents the difference in their bit-length, as well as the difference in their uncertainty. In other words, prefer whichever objective minimizes the uncertainty. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty), as opposed to any specific utility. We argue that this paradigm is well-suited to models that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Neural Networks and Applications
