Identifying Information from Observations with Uncertainty and Novelty
Derek S. Prijatelj (1), Timothy J. Ireland (2), Walter J. Scheirer (1) ((1) University of Notre Dame, (2) Independent Researcher)

TL;DR
This paper formalizes the concept of identifying information in observations, analyzing its properties and sample complexity across various data-generating processes, and connects it with PAC-learning and hypothesis verification.
Contribution
It introduces a formalization of identifying information, analyzes its theoretical properties, and links it with sample complexity and PAC-learning frameworks.
Findings
Proves the information-theoretic characteristics of hypothesis identification.
Analyzes sample complexity for different data-generating processes.
Shows that PAC-Bayes learners' sample complexity distribution is computable.
Abstract
A machine that learns a task from observations must encounter and process uncertainty and novelty, especially when it is to maintain performance when observing new information and to select the hypothesis that best fits the current observations. In this context, some key questions arise: what and how much information did the observations provide, how much information is required to identify the data-generating process, how many observations remain to get that information, and how does a predictor determine that it has observed novel information? We formalize identifying information to answer these questions and synthesize prior works. Identifying information are bits that verify or falsify a hypothesis as the data-generating process. In this formalization, we prove the information theoretic characteristics of the computation of hypothesis identification and the resulting sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
