Few paths, fewer words: model selection with automatic structure   functions

Bj{\o}rn Kjos-Hanssen

arXiv:1608.01399·cs.FL·August 5, 2016

Few paths, fewer words: model selection with automatic structure functions

Bj{\o}rn Kjos-Hanssen

PDF

Open Access

TL;DR

This paper explores model selection for binary strings using structure functions, automata, and automatic complexity, comparing deterministic and nondeterministic models to identify optimal fits.

Contribution

It introduces a finite automata-based approach to model selection with automatic complexity, analyzing differences between deterministic and nondeterministic models.

Findings

01

Deterministic models have higher p-values than nondeterministic models for the same data.

02

Counting paths and words in nondeterministic automata can lead to different optimal models.

03

The approach provides concrete p-values for model fit quality.

Abstract

We consider the problem of finding an optimal statistical model for a given binary string. Following Kolmogorov, we use structure functions. In order to get concrete results, we replace Turing machines by finite automata and Kolmogorov complexity by Shallit and Wang's automatic complexity. The $p$ -value of a model for given data $x$ is the probability that there exists a model with as few states, accepting as few words, fitting uniformly randomly selected data $y$ . Deterministic and nondeterministic automata can give different optimal models. For $x = 01111011011$ , the best deterministic model has $p$ -value $0.3$ , whereas the best nondeterministic model has $p$ -value $0.04$ . In the nondeterministic case, counting paths and counting words can give different optimal models. For $x = 0110001000$ , the best path-counting model has $p$ -value $0.79$ , whereas the best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicssemigroups and automata theory · Computability, Logic, AI Algorithms · Algorithms and Data Compression