Stringological sequence prediction I: efficient algorithms for predicting highly repetitive sequences

Vanessa Kosoy

arXiv:2603.26852·cs.FL·March 31, 2026

Stringological sequence prediction I: efficient algorithms for predicting highly repetitive sequences

Vanessa Kosoy

PDF

TL;DR

This paper introduces efficient stringology-based algorithms for predicting highly repetitive sequences, leveraging complexity measures like minimal automaton states and straight-line program size.

Contribution

It presents novel algorithms that are both time and space efficient, with mistake bounds tied to stringological complexity measures of sequences.

Findings

01

Algorithms are efficient in time and space.

02

Predictability linked to sequence complexity measures.

03

Applicable to classes like automatic and Sturmian sequences.

Abstract

We propose novel algorithms for sequence prediction based on ideas from stringology. These algorithms are time and space efficient and satisfy mistake bounds related to particular stringological complexity measures of the sequence. In this work (the first in a series) we focus on two such measures: (i) the size of the smallest straight-line program that produces the sequence, and (ii) the number of states in the minimal automaton that can compute any symbol in the sequence when given its position in base k as input. These measures are interesting because multiple rich classes of sequences studied in combinatorics of words (automatic sequences, morphic sequences, Sturmian words) have low complexity and hence high predictability in this sense.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.