Automatic sequences and generalised polynomials

Jakub Byszewski; Jakub Konieczny

arXiv:1705.08979·math.NT·April 1, 2020

Automatic sequences and generalised polynomials

Jakub Byszewski, Jakub Konieczny

PDF

TL;DR

This paper investigates the conjecture that bounded generalised polynomial functions are not generated by finite automata unless they are ultimately periodic, using ergodic theory to provide partial results and connections to automatic sequences.

Contribution

It proves that certain sequences derived from polynomials with irrational coefficients are not automatic and relates the conjecture to the nature of powers of integers as generalised polynomials.

Findings

01

Sequences from polynomials with irrational coefficients are not automatic.

02

The conjecture is equivalent to powers of integers not being generalised polynomials.

03

Partial resolution shows such sequences are periodic outside a sparse set.

Abstract

We conjecture that bounded generalised polynomial functions cannot be generated by finite automata, except for the trivial case when they are ultimately periodic. Using methods from ergodic theory, we are able to partially resolve this conjecture, proving that any hypothetical counterexample is periodic away from a very sparse and structured set. In particular, we show that for a polynomial $p (n)$ with at least one irrational coefficient (except for the constant one) and integer $m \geq 2$ , the sequence $⌊ p (n)⌋ mod m$ is never automatic. We also prove that the conjecture is equivalent to the claim that the set of powers of an integer $k \geq 2$ is not given by a generalised polynomial.

Equations121

M sup ∣ Z \cap [M, M + N) ∣ = O (lo g^{r} (N))

M sup ∣ Z \cap [M, M + N) ∣ = O (lo g^{r} (N))

n \mapsto {1, 0, if 2 n ⌊ 3 n ⌋ < n^{- c}; otherwise

n \mapsto {1, 0, if 2 n ⌊ 3 n ⌋ < n^{- c}; otherwise

g_{k} (n) = {1, 0, if n = k^{t} for some t \geq 0; otherwise

g_{k} (n) = {1, 0, if n = k^{t} for some t \geq 0; otherwise

N \to \infty lim \frac{∣ E \cap [ N ] ∣}{N} = d (A) .

N \to \infty lim \frac{∣ E \cap [ N ] ∣}{N} = d (A) .

N \to \infty lim sup M max \frac{∣ E \cap [ M , M + N ) ∣}{N} = d^{*} (A) .

N \to \infty lim sup M max \frac{∣ E \cap [ M , M + N ) ∣}{N} = d^{*} (A) .

δ (s, ϵ) = s, δ (s, w v) = δ (δ (s, w), v), s \in S, w \in Σ_{k}^{*}, v \in Σ_{k} .

δ (s, ϵ) = s, δ (s, w v) = δ (δ (s, w), v), s \in S, w \in Σ_{k}^{*}, v \in Σ_{k} .

N_{k} ((a_{n})) = {(a_{k^{l} n + r})_{n \geq 0} ∣ l \geq 0, 0 \leq r < k^{l}} .

N_{k} ((a_{n})) = {(a_{k^{l} n + r})_{n \geq 0} ∣ l \geq 0, 0 \leq r < k^{l}} .

g (n) = c_{j} if and only if T^{n} z \in S_{j}

g (n) = c_{j} if and only if T^{n} z \in S_{j}

f(n)=\left\llbracket\left\lfloor p(n)\right\rfloor\equiv l\pmod{m}\right\rrbracket

f(n)=\left\llbracket\left\lfloor p(n)\right\rfloor\equiv l\pmod{m}\right\rrbracket

f(n)=\left\llbracket\frac{p(n)}{m}\bmod{1}\in\left[\frac{l}{m},\frac{l+1}{m}\right)\right\rrbracket.

f(n)=\left\llbracket\frac{p(n)}{m}\bmod{1}\in\left[\frac{l}{m},\frac{l+1}{m}\right)\right\rrbracket.

(x_{1}, x_{2}, x_{3}, \dots, x_{d}) \mapsto (x_{1} + a_{d}, x_{2} + x_{1} + a_{d - 1}, \dots, x_{d} + x_{d - 1} + a_{1}) .

(x_{1}, x_{2}, x_{3}, \dots, x_{d}) \mapsto (x_{1} + a_{d}, x_{2} + x_{1} + a_{d - 1}, \dots, x_{d} + x_{d - 1} + a_{1}) .

(T^{n} z)_{j} = z_{j} + i \geq 1 \sum a_{d - j + i} (i n),

(T^{n} z)_{j} = z_{j} + i \geq 1 \sum a_{d - j + i} (i n),

f(n)=\left\llbracket T^{n}z\in A\right\rrbracket.

f(n)=\left\llbracket T^{n}z\in A\right\rrbracket.

A

A

B

g(n)=\sum_{j=1}^{r}\left\llbracket T^{n}z\in S_{j}\right\rrbracket c_{j}.

g(n)=\sum_{j=1}^{r}\left\llbracket T^{n}z\in S_{j}\right\rrbracket c_{j}.

g(n)=\sum_{j=1}^{r}\left\llbracket T^{n}z\in\operatorname{int}S_{j}\right\rrbracket c_{j}+h(n),

g(n)=\sum_{j=1}^{r}\left\llbracket T^{n}z\in\operatorname{int}S_{j}\right\rrbracket c_{j}+h(n),

A = {v_{0} w_{1}^{l_{1}} v_{1} w_{2}^{l_{2}} \dots w_{r}^{l_{r}} v_{r} l_{1}, \dots, l_{r} \in N_{0}},

A = {v_{0} w_{1}^{l_{1}} v_{1} w_{2}^{l_{2}} \dots w_{r}^{l_{r}} v_{r} l_{1}, \dots, l_{r} \in N_{0}},

A = v_{0} w_{1}^{*} v_{1} w_{2}^{*} \dots w_{r}^{*} v_{r} .

A = v_{0} w_{1}^{*} v_{1} w_{2}^{*} \dots w_{r}^{*} v_{r} .

N \to \infty lim \frac{∣ E \cap [ N ] ∣}{lo g ^{r} ( N )} = c .

N \to \infty lim \frac{∣ E \cap [ N ] ∣}{lo g ^{r} ( N )} = c .

N \to \infty lim inf \frac{∣ E \cap [ N ] ∣}{N ^{α}} = \infty.

N \to \infty lim inf \frac{∣ E \cap [ N ] ∣}{N ^{α}} = \infty.

M \in N_{0} max ∣ E \cap [M, M + N) ∣ = O (lo g^{r} (N)) .

M \in N_{0} max ∣ E \cap [M, M + N) ∣ = O (lo g^{r} (N)) .

E = {[v_{0} w_{1}^{l_{1}} v_{1} w_{2}^{l_{2}} \dots w_{r}^{l_{r}} v_{r}]_{k} l_{1}, \dots, l_{r} \in N_{0}} .

E = {[v_{0} w_{1}^{l_{1}} v_{1} w_{2}^{l_{2}} \dots w_{r}^{l_{r}} v_{r}]_{k} l_{1}, \dots, l_{r} \in N_{0}} .

FS (n_{i}) = {n_{α} ∣ α \subset N, 0 < ∣ α ∣ < \infty},

FS (n_{i}) = {n_{α} ∣ α \subset N, 0 < ∣ α ∣ < \infty},

FS (n_{i}; N_{t}) = {n_{α} + N_{t} ∣ t \in N, α \subset {1, 2, \dots, t}, α \neq = \emptyset},

FS (n_{i}; N_{t}) = {n_{α} + N_{t} ∣ t \in N, α \subset {1, 2, \dots, t}, α \neq = \emptyset},

E = {[u_{0} v_{j_{1}} v_{j_{2}} \dots v_{j_{t}} u_{1}] ∣ j_{i} \in {1, 2} for 1 \leq i \leq t and t \geq 0} .

E = {[u_{0} v_{j_{1}} v_{j_{2}} \dots v_{j_{t}} u_{1}] ∣ j_{i} \in {1, 2} for 1 \leq i \leq t and t \geq 0} .

∣ Z \cap [M, M + N) ∣ = O (lo g^{r} (N))

∣ Z \cap [M, M + N) ∣ = O (lo g^{r} (N))

∣ E ∣ \geq \frac{k ^{t} ( k ^{t} + 1 )}{2} - 4 t k^{2 t} b \in B \sum k^{- ∣ b ∣} > \frac{k ^{2 t}}{4} \geq k^{t},

∣ E ∣ \geq \frac{k ^{t} ( k ^{t} + 1 )}{2} - 4 t k^{2 t} b \in B \sum k^{- ∣ b ∣} > \frac{k ^{2 t}}{4} \geq k^{t},

F_{00} = {n \in N_{0} ∣ the binary expansion of n does not contain 00}

F_{00} = {n \in N_{0} ∣ the binary expansion of n does not contain 00}

F_{11} = {n \in N_{0} ∣ the binary expansion of n does not contain 11}

F_{11} = {n \in N_{0} ∣ the binary expansion of n does not contain 11}

f_{\mathrm{BS}}(n)={\left\llbracket\text{the binary expansion of $n$ does not contain $10^{l}1$ for an odd integer $l$}\right\rrbracket}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Automatic sequences and generalised polynomials

Jakub Byszewski and Jakub Konieczny

Department of Mathematics and Computer Science

Institute of Mathematics

Jagiellonian University

ul. prof. Stanisława Łojasiewicza 6

30-348 Kraków

[email protected]

Mathematical Institute

University of Oxford

Andrew Wiles Building

Radcliffe Observatory Quarter

Woodstock Road

Oxford

OX2 6GG

Einstein Institute of Mathematics

Edmond J. Safra Campus

The Hebrew University of Jerusalem

Givat Ram

Jerusalem, 9190401

Israel

[email protected]

Abstract.

We conjecture that bounded generalised polynomial functions cannot be generated by finite automata, except for the trivial case when they are ultimately periodic.

Using methods from ergodic theory, we are able to partially resolve this conjecture, proving that any hypothetical counterexample is periodic away from a very sparse and structured set. In particular, we show that for a polynomial $p(n)$ with at least one irrational coefficient (except for the constant one) and integer $m\geq 2$ , the sequence $\left\lfloor p(n)\right\rfloor\bmod{m}$ is never automatic.

We also prove that the conjecture is equivalent to the claim that the set of powers of an integer $k\geq 2$ is not given by a generalised polynomial.

Key words and phrases:

Generalised polynomials, automatic sequences, IP sets, nilmanifolds, linear recurrence sequences, regular sequences

2010 Mathematics Subject Classification:

Primary: 11B85, 37A45. Secondary: 37B05, 37B10, 11J71, 11B37, 05C20

Introduction

Automatic sequences are sequences whose $n$ -th term is produced by a finite-state machine from the base- $k$ digits of $n$ . (A precise definition is given below.) By definition, automatic sequences can take only finitely many values. Allouche and Shallit [AS92, AS03b] have generalised the notion of automatic sequences to a wider class of regular sequences and demonstrated its ubiquity and links with multiple branches of mathematics and computer science. The problem of demonstrating that a certain sequence is or is not automatic or regular has been widely studied, particularly for sequences of arithmetic origin (see, e.g., [AS92, AS03b, Bel07, SY11, MR15, SP11, Mos08, Row10]).

The aim of this article is to continue this study for sequences that arise from generalised polynomials, i.e., expressions involving algebraic operations and the floor function. Our methods rely on a number of dynamical and ergodic tools. A crucial ingredient in our work is one of the main results from the companion paper [BK16] concerning the combinatorial structure of the set of times at which an orbit on a nilmanifold hits a semialgebraic subset. This is possible because by the work of Bergelson and Leibman [BL07] generalised polynomials are closely related to dynamics on nilmanifolds.

In [AS03b, Theorem 6.2] it is proved that the sequence $(f(n))_{n\geq 0}$ given by $f(n)=\lfloor\alpha n+\beta\rfloor$ for real numbers $\alpha,\beta$ is regular if and only if $\alpha$ is rational. The method used there does not immediately generalise to higher degree polynomials in $n$ , but the proof implicitly uses rotation on a circle by an angle of $2\pi\alpha$ . Replacing the rotation on a circle by a skew product transformation on a torus (as in Furstenberg’s proof of Weyl’s equidistribution theorem [Fur61]), we easily obtain the following result. (For more on regular sequences, see Section 1.)

Theorem A.

Let $p\in\mathbb{R}[x]$ be a polynomial. Then the sequence $f(n)=\lfloor p(n)\rfloor,n\geq 0$ , is regular if and only if all the coefficients of $p$ except possibly for the constant term are rational.

In fact, we show the stronger property that for any integer $m\geq 2$ the sequence $\left\lfloor f(n)\right\rfloor\bmod m$ is not automatic unless all the coefficients of $p$ except for the constant term are rational, in which case the sequence is periodic. It is natural to inquire whether a similar result can be proven for more complicated expressions involving the floor function such as, e.g., $f(n)=\lfloor\alpha\lfloor\beta n^{2}+\gamma\rfloor^{2}+\delta n+\varepsilon\rfloor$ . Such sequences are called generalised polynomial and have been intensely studied (see, e.g., [Hål93, Hål94, HK95, BL07, Lei12, GTZ12, GT12]).

Another closely related motivating example comes from the classical Fibonacci word111We will freely identify words in $\Omega^{\mathbb{N}_{0}}$ with functions $\mathbb{N}_{0}\to\Omega$ . $w_{\mathrm{Fib}}\in\{0,1\}^{\mathbb{N}_{0}}$ , whose systematic study was initiated by Berstel [Ber81, Ber85] (for historical notes, see [AS03a, Sec. 7.12]). There are several ways to define it, each shedding light from a different direction.

(i)

Morphic word. Define the sequence of words $w_{0}:=0$ , $w_{1}:=01$ , and $w_{i+2}:=w_{i+1}w_{i}$ for $i\geq 0$ . Then $w_{\mathrm{Fib}}$ is the (coordinate-wise) limit of $w_{i}$ as $i\to\infty$ . 2. (ii)

Sturmian word. Explicitly, $w_{\mathrm{Fib}}(n)=\left\lfloor(2-\varphi)(n+2)\right\rfloor-{\left\lfloor(2-\varphi)(n+1)\right\rfloor}$ . 3. (iii)

Fib-automatic sequence. If a positive integer $n$ is written in the form $n=\sum_{i=2}^{d}v_{i}F_{i}$ , where $v_{i}\in\{0,1\}$ and there is no $i$ with $v_{i}=v_{i+1}=1$ , then $w_{\mathrm{Fib}}(n)=v_{2}$ .

The equivalence of (i) and (ii) is well-known, see, e.g., [Lot02, Chpt. 2]. The representation $v_{d}v_{d-1}\cdots v_{2}$ of $n$ as a sum of Fibonacci numbers in (iii) is known as the Zeckendorf representation; it exists for each $n$ and is unique. The notion of automaticity using Zeckendorf representation (or, for that matter, a representation from a much wider class) in place of the usual base- $k$ representation of the input $n$ was introduced and studied by Shallit in [Sha88] (see also [Rig00]), where among other things the equivalence of (i) and (iii) is shown. We return to this subject in Section 6.

Hence, $w_{\mathrm{Fib}}$ gives a non-trivial example of a sequence which is given by a generalised polynomial and satisfies a variant of automaticity related to the Zeckendorf representation. It is natural to ask if similar examples exist for the usual notion of $k$ -automaticity. Motivated by Theorem A, we believe the answer is essentially negative, except for trivial examples. We say that a sequence $f$ is ultimately periodic if it coincides with a periodic sequence except on a finite set. The following conjecture was the initial motivation for the line of research pursued in this paper.

Conjecture A.

Suppose that a sequence $f$ is simultaneously automatic and generalised polynomial. Then $f$ is ultimately periodic.

In this paper, we prove several slightly weaker variants of Conjecture A. First of all, we prove that the conjecture holds except on a set of density zero. In fact, in order to obtain such a result, we only need a specific property of automatic sequences. For the purpose of stating the next theorem, let us say that a sequence $f\colon\mathbb{N}\to X$ is weakly periodic if for any restriction $f^{\prime}$ of $f$ to an arithmetic sequence given by $f^{\prime}(n)=f(an+b)$ , $a\in\mathbb{N},\ b\in\mathbb{N}_{0}$ , there exist $q\in\mathbb{N}$ , $r,r^{\prime}\in\mathbb{N}_{0}$ with $r\neq r^{\prime}$ , such that $f^{\prime}(qn+r)=f^{\prime}(qn+r^{\prime})$ . Of course, any periodic sequence is weakly periodic, but not conversely. All automatic sequences are weakly periodic (this follows from the fact that automatic sequences have finite kernels, see Lemma 2.1). Another non-trivial example of a weakly periodic sequence is the characteristic function of the square-free numbers.

Theorem B.

Suppose that a sequence $f\colon\mathbb{N}_{0}\to\mathbb{R}$ is weakly periodic and generalised polynomial. Then there exists a periodic function $b\colon\mathbb{N}_{0}\to\mathbb{R}$ and a set $Z\subset\mathbb{N}_{0}$ of upper Banach density zero such that $f(n)=b(n)$ for $n\in\mathbb{N}_{0}\setminus Z$ .

(For the definition of Banach density, see Section 1.)

Theorem B is already sufficient to rule out automaticity of many natural examples of generalised polynomials. In particular, sequences such as $\left\lfloor\sqrt{2}n\left\lfloor\sqrt{3n}\right\rfloor\right\rfloor\bmod{10}$ or $\left\lfloor\sqrt{2}n\left\lfloor\sqrt{3n}\right\rfloor^{2}+\sqrt{5}n+\sqrt{7}\right\rfloor\bmod{10}$ are not automatic. For details and more examples, see Corollary 2.7.

To obtain stronger bounds on the size of the “exceptional” set $Z$ , we restrict ourselves to automatic sequences and exploit some finer properties of generalised polynomials studied in the companion paper [BK16]. We use results concerning growth properties of automatic sequences to derive the following dichotomy: If $a\colon\mathbb{N}_{0}\to\{0,1\}$ is an automatic sequence, then the set of integers where $a$ takes the value $1$ is either combinatorially rich (it contains what we call an $\mathrm{IPS}$ set) or extremely sparse (in particular, the number of its elements up to $N$ grows as $\log^{r}(N)$ for some integer $r$ ); see Theorem 3.10. This result is especially interesting for sparse automatic sequences, i.e., automatic sequences which take non-zero values on a set of integers of density [math]. Conversely, in [BK16] we show that sparse generalised polynomials must be free of similar combinatorial structures. As a consequence, we prove the following result.

Theorem C.

Suppose that a sequence $f\colon\mathbb{N}_{0}\to\mathbb{R}$ is automatic and generalised polynomial. Then there exists a periodic function $b\colon\mathbb{N}_{0}\to\mathbb{R}$ , a set $Z\subset\mathbb{N}_{0}$ , and a constant $r$ such that $f(n)=b(n)$ for $n\in\mathbb{N}_{0}\setminus Z$ and

[TABLE]

as $N\to\infty$ for a certain constant $r$ (dependent on $f$ ).

In fact, we obtain a much more precise structural description of the exceptional set $Z$ (see Theorem 3.7 for details). Similar techniques allow us to show non-automaticity of some sparse generalised polynomials. For instance, the sequence given by

[TABLE]

is not automatic provided that $c$ is small enough. (Here, $\left\lVert x\right\rVert$ denotes the distance of $x$ from $\mathbb{Z}$ .) For details, see Example 4.7.

While Theorem C does not resolve Conjecture A, our proof thereof greatly restricts the number of possible counterexamples. In fact, in order to prove Conjecture A, it would suffice to prove that the characteristic sequence of powers of an integer $k\geq 2$ given by

[TABLE]

is not a generalised polynomial.

Theorem D.

Let $k\geq 2$ be an integer. Then exactly one of the following statements holds:

(i)

All sequences that are simultaneously $k$ -automatic and generalised polynomial are ultimately periodic. 2. (ii)

The characteristic sequence $g_{k}$ of the powers of $k$ is generalised polynomial.

Unfortunately, we are currently unable to decide which of the two possibilities in Theorem D holds. Although we expect that $g_{k}$ should not be a generalised polynomial, in [BK16] we obtain several examples of algebraic numbers $\lambda>1$ such that the characteristic function of the set $E_{\lambda}:={\left\{\langle\!\langle\lambda^{i}\rangle\!\rangle\ \middle|\ i\in\mathbb{N}_{0}\right\}}$ is generalised polynomial, where $\langle\!\langle x\rangle\!\rangle$ denotes the closest integer to $x$ . All our examples are Pisot units (a Pisot number is an algebraic integer $\lambda>1$ all of whose conjugates have modulus $<1$ ; a Pisot unit is a Pisot number whose minimal polynomial has constant term $\pm 1$ ). Conversely, there is no $\lambda>1$ for which we can prove that the characteristic function of $E_{\lambda}$ is not given by a generalised polynomial. This prompts us to propose the following question.

Question A.

Suppose that $\lambda>1$ is such that the characteristic function of the set $E_{\lambda}:={\left\{\langle\!\langle\lambda^{i}\rangle\!\rangle\ \middle|\ i\in\mathbb{N}_{0}\right\}}$ is given by a generalised polynomial. Is it then necessarily the case that $\lambda$ is a Pisot unit?

For a more detailed discussion of this question, see [BK16, Section 6]. If $\lambda$ is a Pisot number, then $\langle\!\langle\lambda^{i}\rangle\!\rangle$ obeys a linear recurrence. We show that for such $\lambda$ , the characteristic function of $E_{\lambda}$ cannot be a counterexample to Conjecture A (see Proposition 4.9) except possibly if $\lambda$ is an integer.

By Theorem D, determining the validity of Conjecture A is equivalent to answering Question A in the special case when $\lambda$ is an integer.

In Section 1, we discuss some basic notions and results concerning automatic sequences and dynamical systems. We intended this section to be accessible to readers familiar with only one (or neither) of these topics. In Section 2, we prove Theorem A and Theorem B using methods from topological dynamics. In Section 3, we use known results on growth and structure of automatic sequences to prove that they are either very sparse and structured (in which case we call them arid) or are combinatorially rich. Together with a result about dynamics on nilmanifolds, this allows us to obtain Theorem C. Section 4 contains four seperate topics concerning examples and non-examples of automatic sets and uniform density of symbols in automatic sequences. Section 5 is devoted to the proof of Theorem D. Finally, Section 6 discusses some open problems and future research topics.

Acknowledgements

The authors thank Ben Green for much useful advice during the work on this project, Vitaly Bergelson and Inger Håland-Knutson for valuable comments on the distribution of generalised polynomials, and Jean-Paul Allouche and Narad Rampersad for information about related results on automatic sequences.

Thanks also go to Sean Eberhard, Dominik Kwietniak, Freddie Manners, Rudi Mrazović, Przemek Mazur, Sofia Lindqvist, and Aled Walker for many informal discussions.

This research was supported by the National Science Centre, Poland (NCN) under grant no. DEC-2012/07/E/ST1/00185.

Finally, we would like to express our gratitude to the organisers of the conference New developments around ${\times 2}\ {\times 3}$ conjecture and other classical problems in Ergodic Theory in Cieplice, Poland in May 2016 where we began our project.

1. Background

Notations and generalities

We denote the sets of positive integers and of nonnegative integers by $\mathbb{N}=\{1,2,\ldots\}$ and $\mathbb{N}_{0}=\{0,1,\ldots\}$ . We denote by $[N]$ the set $[N]=\{0,1,\ldots,N-1\}.$ We use the Iverson convention: whenever $\varphi$ is any sentence, we denote by $\left\llbracket\varphi\right\rrbracket$ its logical value ( $1$ if $\varphi$ is true and [math] otherwise). We denote the number of elements in a finite set $A$ by $|A|$ .

For a real number $r$ , we denote its integer part by $\lfloor r\rfloor$ , its fractional part by $\{r\}=r-\lfloor r\rfloor$ , the nearest integer to $r$ by $\langle\!\langle r\rangle\!\rangle=\lfloor r+1/2\rfloor$ , and the distance from $r$ to the nearest integer by $\left\lVert r\right\rVert=|r-\langle\!\langle r\rangle\!\rangle|$ .

For a subset $E\subset\mathbb{N}_{0}$ , we say that $E$ has natural density $d(A)$ if

[TABLE]

We say that $E\subset\mathbb{N}_{0}$ has upper Banach density $d^{*}(A)$ if

[TABLE]

We now formally define generalised polynomials.

Definition 1.1 (Generalised polynomial).

The family $\mathrm{GP}$ of generalised polynomials is the smallest set of functions $\mathbb{Z}\to\mathbb{R}$ containing the polynomial maps and closed under addition, multiplication, and the operation of taking the integer part. Whenever it is more convenient, we regard generalised polynomials as functions on $\mathbb{N}_{0}$ .

A set $E\subset\mathbb{Z}$ (or $E\subset\mathbb{N}_{0}$ ) is called generalised polynomial if its characteristic function given by $f(n)=\left\llbracket n\in E\right\rrbracket$ is a generalised polynomial. (Note that this definition depends on whether we are regarding the generalised polynomial as a function on $\mathbb{Z}$ or on $\mathbb{N}_{0}$ and a generalised polynomial set $E\subset\mathbb{N}_{0}$ might a priori not be generalised polynomial when considered as a subset of $\mathbb{Z}$ . It will always be clear from the context which meaning we have in mind.)

An example of a generalised polynomial is therefore a function $f$ given by the formula $f(n)=\sqrt{3}\lfloor\sqrt{2}n^{2}+1/7\rfloor^{2}+n\lfloor n^{3}+\pi\rfloor$ .

Automatic sequences

Whenever $A$ is a (finite) set, we denote the free monoid with basis $A$ by $A^{*}$ . It consists of finite words in $A$ , including the empty word $\epsilon$ , with the operation of concatenation. We denote the concatenation of two words $v,w\in A^{*}$ by $vw$ and we denote the length of a word $w\in A^{*}$ by $|w|$ . In particular, $|\epsilon|=0$ . We say that a word $v\in A^{*}$ is a factor of a word $w\in A^{*}$ if there exist words $u,u^{\prime}\in A^{*}$ such that $w=uvu^{\prime}$ . We denote by $w^{\mathrm{R}}\in A^{*}$ the reversal of the word $w\in A^{*}$ (the word in which the elements of $A$ are written in the opposite order).

Let $k\geq 2$ be an integer and denote by $\Sigma_{k}=\{0,1,\ldots,k-1\}$ the set of digits in base $k$ . For $w\in\Sigma_{k}^{*}$ , we denote by $[w]_{k}$ the integer whose expansion in base $k$ is $w$ , i.e., if $w=v_{l}v_{l-1}\cdots v_{1}v_{0}$ , $v_{i}\in\Sigma_{k}$ , then $[w]_{k}=\sum_{i=0}^{l}v_{i}k^{i}$ . Conversely, for an integer $n\geq 0$ , we write $(n)_{k}\in\Sigma_{k}^{*}$ for the base- $k$ representation of $n$ (without an initial zero). In particular, $(0)_{k}=\epsilon$ .

The class of automatic sequences consists, informally speaking, of finite-valued sequences $(a_{n})_{n\geq 0}$ whose values $a_{n}$ are obtained via a finite procedure from the digits of base- $k$ expansion of an integer $n$ .

The most famous example of an automatic sequence is arguably the Thue–Morse sequence, first discovered by Prouhet in 1851. Let $s_{2}(n)$ denote the sum of digits of the base 2 expansion of an integer $n$ . Then the Thue–Morse sequence $(t_{n})_{n\geq 0}$ is given by $t_{n}=1$ if $s_{2}(n)$ is odd and $t_{n}=0$ if $s_{2}(n)$ is even.

We will introduce the basic properties of automatic sequences. For more information, we refer the reader to the canonical book of Allouche and Shallit [AS03a]. To formally introduce the notion of automatic sequences, we begin by discussing finite automata.

Definition 1.2.

A deterministic finite $k$ -automaton with output (which we will just call a $k$ -automaton) $\mathcal{A}=(S,\Sigma_{k},\delta,s_{0},\Omega,\tau)$ consists of the following data:

(i)

a finite set of states $S$ ; 2. (ii)

an initial state $s_{0}\in S$ ; 3. (iii)

a transition map $\delta\colon S\times\Sigma_{k}\to S$ ; 4. (iv)

an output set $\Omega$ ; 5. (v)

an output map $\tau\colon S\to\Omega$ .

We extend the map $\delta$ to a map $\delta\colon S\times\Sigma_{k}^{*}\to S$ (denoted by the same letter) by the recurrence formula

[TABLE]

We call a sequence $k$ -automatic if it can be produced by a $k$ -automaton in the following manner: one starts at the initial state of the automaton, follows the digits of the base- $k$ expansion of an integer $n$ , and then uses the output function to print the $n$ -th term of the sequence. This is stated more precisely in the following definition.

Definition 1.3.

A sequence $(a_{n})_{n\geq 0}$ with values in a finite set $\Omega$ is $k$ -automatic if there exists a $k$ -automaton $\mathcal{A}=(S,\Sigma_{k},\delta,s_{0},\Omega,\tau)$ such that $a_{n}=\tau\left(\delta(s_{0},(n)_{k})\right)$ . We call a set $E$ of nonnegative integers automatic if the characteristic sequence $(a_{n})_{n\geq 0}$ of $E$ given by $a_{n}=\left\llbracket n\in E\right\rrbracket$ is automatic.

For some applications, it will be useful to consider the following variant of the definition. A function $\tilde{a}\colon\Sigma_{k}^{*}\to\Omega$ is automatic if there exists a $k$ -automaton $\mathcal{A}=(S,\Sigma_{k},\delta,s_{0},\Omega,\tau)$ such that $\tilde{a}(u)=\tau(\delta(s_{0},u))$ for $u\in\Sigma_{k}^{*}$ .

The values of the Thue–Morse sequence are given by the $2$ -automaton

$s_{0}$$s_{1}$ 0110

with nodes depicting the states of the automaton, edges describing the transition map, $\tau(s_{0})=0$ , and $\tau(s_{1})=1$ . Thus, the Thue–Morse sequence is $2$ -automatic.

In the definition above, the automaton reads the digits starting with the most significant one. In fact, we might equally well demand that the digits be read starting with the least significant digit or that the automaton produce the correct answer even if the input contains some leading zeros. Neither of these modifications changes the notion of automatic sequence [AS03a, Theorem 5.2.3] (though of course for most sequences we would need to use a different automaton to produce a given automatic sequence).

There is a number of equivalent definitions of the notion of automatic sequence connecting them to different branches of mathematics (stated for example in terms of algebraic power series over finite fields or letter-to-letter projections of fixed points of uniform morphisms of free monoids). We will need one such definition that has a combinatorial flavour and is expressed in terms of the $k$ -kernel.

Definition 1.4.

The $k$ -kernel $\mathcal{N}_{k}((a_{n}))$ of a sequence $(a_{n})_{n\geq 0}$ is the set of its subsequences of the form

[TABLE]

Automaticity of a sequence is equivalent to finiteness of its kernel, originally due to Eilenberg [Eil74].

Proposition 1.5.

[AS03a, Theorem 6.6.2]** Let $(a_{n})_{n\geq 0}$ be a sequence. Then the following conditions are equivalent:

(i)

The sequence $(a_{n})$ is $k$ -automatic. 2. (ii)

The $k$ -kernel $\mathcal{N}_{k}((a_{n}))$ is finite.

For the Thue–Morse sequence we have the relations $t_{2n}=t_{n}$ , $t_{2n+1}=1-t_{n}$ , and hence one easily sees that the $2$ -kernel $\mathcal{N}_{2}((t_{n}))$ consists of only two sequences $\mathcal{N}_{2}((t_{n}))=\{t_{n},1-t_{n}\}$ . This gives another argument for the $2$ -automaticity of the Thue–Morse sequence.

An automatic sequence by definition takes only finitely many values. In 1992 Allouche and Shalit [AS92] generalised the notion of automatic sequences to the wider class of $k$ -regular sequences that are allowed to take values in a possibly infinite set. The definition of regular sequences is stated in terms of the $k$ -kernel. For simplicity, we state the definition over the ring of integers, though it could also be introduced over a general (noetherian) ring.

Definition 1.6.

Let $(a_{n})_{n\geq 0}$ be a sequence of integers. We say that the sequence $(a_{n})$ is $k$ -regular if its $k$ -kernel $\mathcal{N}_{k}((a_{n}))$ spans a finitely generated abelian subgroup of $\mathbb{Z}^{\mathbb{N}_{0}}$ .

For example, the following sequences are easily seen to be $2$ -regular: $(t_{n})_{n\geq 0}$ , $(n^{3}+5)_{n\geq 0}$ , $(s_{2}(n))_{n\geq 0}$ . (The corresponding subgroups spanned by the $2$ -kernel have rank $2$ , $4$ , and $2$ , respectively. In the case of $t=(t_{n})_{n\geq 0}$ , the subgroup spanned by the $2$ -kernel is free abelian with basis consisting of $t$ and the constant sequence $(1)_{n\geq 0}$ .) In fact, every $k$ -automatic (integer-valued) sequence is obviously $k$ -regular, and the following converse result holds.

Theorem 1.7.

[AS03a, Theorem 16.1.5]** Let $(a_{n})_{n\geq 0}$ be a sequence of integers. Then the following conditions are equivalent:

(i)

The sequence $(a_{n})$ is $k$ -automatic. 2. (ii)

The sequence $(a_{n})$ is $k$ -regular and takes only finitely many values.

Corollary 1.8.

[AS03a, Corollary 16.1.6]** Let $(a_{n})_{n\geq 0}$ be a sequence of integers that is $k$ -regular and let $m\geq 1$ be an integer. Then the sequence $(a_{n}\bmod m)$ is $k$ -automatic.

A convenient tool for ruling out that a given sequence is automatic is provided by the pumping lemma.

Lemma 1.9.

[AS03a, Lemma 4.2.1]** Let $(a_{n})_{n\geq 0}$ be a $k$ -automatic sequence. Then there exists a constant $N$ such that for any $w\in\Sigma^{*}_{k}$ with $\left|w\right|\geq N$ and any integer $0\leq L\leq\left|w\right|-N$ there exist $u_{0},u_{1},v\in\Sigma^{*}_{k}$ such that $v\neq\epsilon$ , $w=u_{0}vu_{1}$ , $L\leq\left|u_{0}\right|\leq L+N-\left|v\right|$ , and $a_{n}$ takes the same value for all $n\in{\left\{[u_{0}v^{t}u_{1}]_{k}\ \middle|\ t\in\mathbb{N}_{0}\right\}}$ .

The final issue that we need to discuss is the dependence of the notion of $k$ -automaticity on the base $k$ . While the Thue–Morse sequence is $2$ -regular, and is also easily seen to be $4$ -regular, it is not $3$ -regular. This follows from the celebrated result of Cobham [Cob69]. We say that two integers $k,l\geq 2$ are multiplicatively independent if they are not both powers of the same integer (equivalently, $\log k/\log l\notin\mathbb{Q}$ ).

Theorem 1.10.

[AS03a, Theorem 11.2.2]** Let $(a_{n})_{n\geq 0}$ be a sequence with values in a finite set $\Omega$ . Assume that the sequence $(a_{n})$ is simultaneously $k$ -automatic and $l$ -automatic with respect to two multiplicatively independent integers $k,l\geq 2$ . Then $(a_{n})$ is eventually periodic.

We will have no use for Cobham’s theorem. We will, however, use the following much easier related result.

Theorem 1.11.

[AS03a, Theorem 6.6.4]** Let $(a_{n})_{n\geq 0}$ be a sequences with values in a finite set $\Omega$ . Let $k,l\geq 2$ be two multiplicatively dependent integers. Then the sequence $(a_{n})$ is $k$ -automatic if and only if it is $l$ -automatic.

Let $A$ denote a finite alphabet and let $L$ and $L^{\prime}$ be languages, i.e., subsets of $A^{*}$ . We denote by $LL^{\prime}=\{wv\mid w\in L,v\in L^{\prime}\}$ the concatenation of $L$ and $L^{\prime}$ . For an integer $i\geq 0$ , we denote by $L^{i}=L\cdots L$ the concatenation of $i$ copies of $L$ with the understanding that $L^{0}=\{\epsilon\}$ . The Kleene closure of $L$ is $L^{*}=\bigcup_{i\geq 0}L^{i}$ . A language $L$ is regular if it can be obtained from the empty set and the letters of the alphabet using the operations of union, concatenation, and the Kleene closure.

Regular languages are intimately connected with automatic sequences via Kleene’s theorem [Kle56] (see also [AS03a, Thm. 4.1.5]), which says that a language $L$ over the alphabet $\Sigma_{k}$ is regular if and only if the sequence $(a_{n})_{n\geq 0}$ given by $a_{n}=\left\llbracket(n)_{k}\in L\right\rrbracket$ is $k$ -automatic.

Dynamical systems

An (invertible, topological) dynamical system is given by a compact metrisable space $X$ and a continuous homeomorphism $T\colon X\to X$ . We say that $X$ is minimal if for every point $x\in X$ the orbit $\{T^{n}x\mid n\in\mathbb{Z}\}$ is dense in $X$ . (Equivalently, the only closed subsets $Y\subset X$ such that $T(Y)=Y$ are $Y=X$ or $Y=\emptyset$ .) We say that $X$ is totally minimal if the system $(X,T^{n})$ is minimal for all $n\geq 1$ .

Let $(X,T)$ be a dynamical system. We say that a Borel measure $\mu$ on $X$ is invariant if for every Borel subset $A\subset X$ we have $\mu(T^{-1}(A))=\mu(A)$ . By the Krylov–Bogoliubov theorem (see, e.g., [EW11, Thm. 4.1]), each dynamical system has at least one invariant measure. We say that a dynamical system in uniquely ergodic if it has exactly one invariant measure.

If $(X,T)$ is minimal, $x\in X$ , and $U\subset X$ is open, then the set ${\left\{n\in\mathbb{Z}\ \middle|\ T^{n}x\in U\right\}}$ is syndetic, i.e., has bounded gaps [Fur81, Thm. 1.15].

We will need the following standard consequence of the ergodic theorem [EW11, Thm 4.10], which we also note in [BK16, Corollary 1.4]. (Below and elsewhere, $\delta S$ denotes the boundary of the set $S$ .)

Corollary 1.12.

Let $(X,T)$ be a uniquely ergodic dynamical system with the invariant measure $\mu$ . Then for any $x\in X$ and any $S\subset X$ with $\mu(\partial S)=0$ , the set $E={\left\{n\in\mathbb{N}_{0}\ \middle|\ T^{n}x\in S\right\}}$ has upper Banach density $\mu(S)$ .

In fact, in this case the limit superior in the definition of upper Banach density can be replaced by a limit.

The connection between generalised polynomials and dynamics of nilsystems has been intensely studied by Bergelson and Leibman in [BL07] (see also [Lei12]). Nilsystems are a widely studied class of dynamical systems of algebraic origin. Here, we only need several properties which these systems enjoy; in particular, we shall spare the reader the definition of a nilsystem. A good introduction to nilsystems may be found in the initial sections of [BL07].

A nilsystem $(X,T)$ is minimal if and only if it is uniquely ergodic; the unique invariant measure $\mu_{X}$ has then full support. If $(X,T)$ is minimal but not totally minimal, then $X$ splits into finitely many connected components $X_{1},\dots,X_{n}$ , each $X_{i}$ is preserved by $T^{n}$ , and each $(X_{i},T^{n})$ is a totally minimal nilsystem.

As a special case of the aforementioned connection between nilsystems and generalised polynomials [BL07, Thm. A], we have the following result. (For more details, see also [BK16].)

Theorem 1.13 (Bergelson–Leibman).

Let $g\colon\mathbb{Z}\to\mathbb{R}$ be a generalised polynomial taking finitely many values $\{c_{1},\dots,c_{r}\}$ . Then there exists a minimal nilsystem $(X,T)$ as well as a point $z\in X$ and a partition $X=S_{1}\cup S_{2}\cup\ldots\cup S_{r}$ such that $\mu_{X}(\partial S_{j})=0$ and

[TABLE]

for each $1\leq j\leq r$ .

Remark 1.14.

Let $g\colon\mathbb{Z}\to\mathbb{R}$ be a generalised polynomial taking finitely many values. Then there exists $a\in\mathbb{N}$ such that for any $b\in\mathbb{Z}$ the generalised polynomial $g_{a,b}(n):=g(an+b)$ has a representation as in Theorem 1.13 with $(X,T)$ totally minimal.

2. Density 1 results

Polynomial sequences

Our first purpose in this section is to prove Theorem A. Recall that we aim to show that the sequence $n\mapsto\left\lfloor p(n)\right\rfloor$ is not regular if $p(x)\in\mathbb{R}[x]$ has at least one irrational coefficient other than the constant term. We will show more, namely that the sequence $n\mapsto\left\lfloor p(n)\right\rfloor\bmod{m}$ is not automatic for any $m\geq 2$ . In fact, we will only need to work with the weaker property of weak periodicity, defined in the introduction.

Lemma 2.1.

Every automatic sequence is weakly periodic.

Proof.

Let $f$ be a $k$ -automatic sequence. Since the restriction of a $k$ -automatic sequence to an arithmetic progression is again $k$ -automatic [AS03a, Theorem 6.8.1], it will suffice to find $q\in\mathbb{N}$ and $r,r^{\prime}\in\mathbb{N}_{0}$ with $r\neq r^{\prime}$ such that $f(qn+r)=f(qn+r^{\prime})$ .

The $k$ -kernel $\mathcal{N}_{k}(f)$ of $f$ , consisting of the functions $f(k^{t}n+r)$ for $0\leq r<k^{t}$ , is finite. Pick $t$ sufficiently large that $k^{t}>\left|\mathcal{N}_{k}(f)\right|$ . By the pigeonhole principle, there exist $r\neq r^{\prime}$ such that $f(k^{t}n+r)=f(k^{t}n+r^{\prime})$ . ∎

The proof of the following proposition is closely analogous to Furstenberg’s proof [Fur61] of Weyl’s equidistribution theorem [Wey16] (see also [EW11, Section 4.4.3]).

Proposition 2.2.

Let $p(x)\in\mathbb{R}[x]$ be a polynomial, and let $m\geq 2$ be an integer. Then the sequence $(\left\lfloor p(n)\right\rfloor\bmod{m})_{n\geq 0}$ is weakly periodic if and only if it is periodic. This happens precisely when all non-constant coefficients of $p(x)$ are rational.

Proof.

If all coefficients of $p(x)$ are rational (except possibly for the constant term) then the sequence $(\left\lfloor p(n)\right\rfloor\bmod{m})$ is easily seen to be periodic, hence weakly periodic.

Now suppose that at least one non-constant coefficient of $p(x)$ is irrational. Replacing $p(x)$ with $p(hx+r)$ for multiplicatively large $h$ and $r=0,1,\ldots,h-1$ , we may assume that the leading coefficient of $p(x)$ is irrational. We will prove marginally more than claimed, namely that for any $0\leq l<m$ the sequence $f$ given by

[TABLE]

fails to be weakly periodic. For a proof by contradiction, suppose this claim is false for some choice of $l$ .

It will be convenient to expand $p(x)/m=\sum_{i=0}^{d}a_{i}\binom{x}{i}$ , where $d=\deg p$ , $a_{i}\in\mathbb{R}$ , and $\binom{x}{i}=x(x-1)(x-2)\cdots(x-i+1)/i!$ . Note that $a_{d}\in\mathbb{R}\setminus\mathbb{Q}$ and

[TABLE]

We will represent the sequence $p$ dynamically. Let $X$ be the $d$ -dimensional torus $\mathbb{T}^{d}$ and define the self-map $T\colon X\to X$ by

[TABLE]

Put $a_{j}=0$ for $j>d$ . A direct computation shows that for $z=(0,0,\dots,0,a_{0})$ and $j=1,\ldots,d$ we have

[TABLE]

and in particular $(T^{n}z)_{d}=p(n)/m.$ Putting $A=\mathbb{T}^{d-1}\times\left[\frac{l}{m},\frac{l+1}{m}\right)$ , we thus find that

[TABLE]

Since $f$ is weakly periodic, we may find $q$ and $r\neq r^{\prime}$ such that $f(qn+r)=f(qn+r^{\prime})$ .

The dynamical system $(X,T)$ can be obtained as a sequence of iterated group extension over an irrational rotation, and hence is totally minimal (this follows easily from the results in, e.g., [EW11, Section 4.4.3]). In particular, for any point $y\in\operatorname{cl}A$ we may find a sequence $(n_{i})_{i\geq 0}$ such that $T^{qn_{i}+r}z\to y$ and $T^{qn_{i}+r}z\in A$ . It follows that the points $T^{qn_{i}+r^{\prime}}z$ converge to $T^{r^{\prime}-r}y$ and lie in $A$ . Thus, $T^{r^{\prime}-r}(\operatorname{cl}A)\subset\operatorname{cl}A$ . In light of total minimality of $T$ , this is only possible if $\operatorname{cl}A=X$ or $\operatorname{cl}A=\emptyset$ — but this is absurd. ∎

Corollary 2.3.

With the notation of Proposition 2.2, the sequence $n\mapsto\left\lfloor p(n)\right\rfloor\bmod{m}$ is automatic if and only if it is periodic, and if and only if all the non-constant coefficients of $p(x)$ are rational.

Proof.

Immediate from Proposition 2.2 and Lemma 2.1. ∎

Proof of Theorem A.

Suppose first that all non-constant coefficients of $p(n)$ are rational, and fix an integer $k\geq 2$ . Let $h\in\mathbb{N}$ be such that $hp(n)$ has integer coefficients, except possibly for the constant term. Then $f_{1}(n)=\left\lfloor hp(n)\right\rfloor$ is an integer-valued polynomial, hence is $k$ -regular ( $\mathcal{N}_{k}(f_{1})$ is contained in the $(\deg p+1)$ -dimensional $\mathbb{Z}$ -module consisting of integer-valued polynomials of degree $\leq\deg p$ ). Also, $f_{2}(n)=\left\lfloor hp(n)\right\rfloor-hf(n)=\left\lfloor h\left\{p(n)\right\}\right\rfloor$ is periodic, hence $k$ -automatic, hence $k$ -regular. It follows that $f(n)=\frac{1}{h}\left(f_{1}(n)-f_{2}(n)\right)$ is regular.

Conversely, suppose that $f(n)$ is regular. Then by Theorem 1.7 for any choice of $m\geq 2$ the sequence $f(n)\bmod{m}$ is automatic. Now, it follows from Corollary 2.3 that all non-constant coefficients of $p(x)$ are rational. ∎

Generalised polynomials

Having dealt with the case of polynomial maps, we move on to a more general context. Our next goal is to prove Theorem B. We begin by abstracting and generalising some of the key steps from the proof of Theorem A.

Recall that a set of integers is thick if it contains arbitrarily long segments of consecutive integers, and syndetic if it has bounded gaps; every thick set intersects every syndetic set.

Lemma 2.4.

Let $(X,T)$ be a totally minimal dynamical system. Let $A\subset X$ be a set which is neither empty nor dense and such that $\operatorname{cl}A=\operatorname{cl}\operatorname{int}A$ . Let $z\in X$ . Suppose that $f\colon\mathbb{N}_{0}\to\{0,1\}$ is a sequence such that the set of $n$ with $f(n)=\left\llbracket T^{n}z\in A\right\rrbracket$ is thick. Then $f$ is not weakly periodic.

Proof.

Suppose for the sake of contradiction that $f$ is weakly periodic. In particular there exist $q\in\mathbb{N}$ , $r,r^{\prime}\in\mathbb{N}_{0}$ with $r\neq r^{\prime}$ such that $f(qn+r)=f(qn+r^{\prime})$ . Put $d=r^{\prime}-r$ .

We will show that $T^{d}(\operatorname{cl}A)\subset\operatorname{cl}A$ . Since $T$ is continuous and $\operatorname{cl}\operatorname{int}A=\operatorname{cl}A$ , it will suffice to prove that $T^{d}(\operatorname{int}A)\subset\operatorname{cl}A$ . Once this is accomplished, the contradiction follows immediately, because $(X,T^{d})$ is minimal, while $\operatorname{cl}A\neq\emptyset,X$ .

Pick any $y\in\operatorname{int}A$ and an open neighbourhood $V$ of $T^{d}y$ ; we aim to show that $V\cap A\neq\emptyset$ . Put $U=T^{-d}V\cap\operatorname{int}A$ , and consider the set $S$ of those $n$ for which $T^{qn+r}z\in U$ . Since $(X,T^{q})$ is minimal and $U\neq\emptyset$ , the set $S$ is syndetic. Let $R_{0}$ be the set of those $n$ for which $f(n)=\left\llbracket T^{n}z\in A\right\rrbracket$ and put $R=\{n\in\mathbb{N}_{0}\mid qn+r\in R_{0}\}$ and $R^{\prime}=\{n\in\mathbb{N}_{0}\mid qn+r^{\prime}\in R_{0}\}$ .

Since $R_{0}$ is thick, so is $R\cap R^{\prime}$ . Since $S$ is syndetic, $S\cap R\cap R^{\prime}$ is non-empty. Pick any $n\in S\cap R\cap R^{\prime}$ and put $x=T^{qn+r}z$ . Since $n\in S$ , we have $x\in U\subset A$ , and so $T^{d}x\in V$ . Since $n\in R$ , we have $f(qn+r)=\left\llbracket x\in A\right\rrbracket=1$ , and hence also $f(qn+r^{\prime})=1$ . Finally, since $n\in R^{\prime}$ , we have $1=f(qn+r^{\prime})=\left\llbracket T^{d}x\in A\right\rrbracket$ , meaning that $T^{d}x\in V\cap A$ . In particular, $V\cap A\neq\emptyset$ , which was our goal. ∎

Remark 2.5.

Some mild topological restrictions on the target set $A$ are, of course, necessary in the above lemma. Note that any open, non-dense and non-empty subset of $X$ will satisfy the stated assumptions.

The assumption that the map $T$ is totally minimal is essential. Indeed, take $X$ to be the Thue–Morse shift, i.e., the closed orbit under the shift map of the Thue–Morse sequence. Let

[TABLE]

Since the Thue–Morse sequence $(t_{n})$ has the property $t_{2n}\neq t_{2n+1}$ for all $n$ and since the Thue–Morse word contains no cubes (i.e., no occurences of factors of the form $www$ with $w\in\Sigma_{k}^{*}$ , $w\neq\epsilon$ ), we see that $A\cap B\neq\emptyset$ , $X=A\cup B$ and $A$ and $B$ are clopen. Let $z=(t_{n})\in X$ be the Thue–Morse sequence. Then the function $f(n)=\left\llbracket T^{n}z\in A\right\rrbracket$ is periodic with period $2$ , and while $X$ is minimal, it is not totally minimal.

The analogue of the representation of a polynomial sequence using a skew rotation on the torus in (5) is provided by the Bergelson–Leibman Theorem 1.13. We are now ready to state and prove the main result of this section, from which Theorem B easily follows.

Theorem 2.6.

Let $g\colon\mathbb{Z}\to\mathbb{R}$ be a generalised polynomial taking finitely many values, and let $f\colon\mathbb{N}_{0}\to\mathbb{R}$ be a weakly periodic sequence which agrees with $g$ on a thick set $R\subset\mathbb{N}_{0}$ . Then there exists a set $Z\subset R$ with $d^{*}(Z)=0$ such that the common restriction of $f$ and $g$ to $R\setminus Z$ is periodic.

Proof.

Let the minimal nilsystem $(X,T)$ , $z\in X$ , and a partition $X=\bigcup_{j=1}^{r}S_{j}$ be as in Theorem 1.13, so that in particular

[TABLE]

If $X$ is not totally minimal, then (as in Remark 1.14) we may find $a\in\mathbb{N}$ such that for any $b\in\mathbb{Z}$ , $g^{\prime}_{a,b}(n)=g(an+b)$ has a representation as in (6) on a totally minimal nilsystem. Clearly, $f^{\prime}_{a,b}(n)=f(an+b)$ is weakly periodic and agrees with $g^{\prime}_{a,b}(n)$ on the thick set $R^{\prime}_{a,b}=\{n\mid an+b\in R\}$ . Thus, it will suffice to prove the theorem under the additional assumption that $(X,T)$ is totally minimal.

We may write

[TABLE]

where $h(n)=0$ unless $T^{n}z\in\bigcup_{j=1}^{r}\partial S_{j}$ . In particular (by Corollary 1.12), the set $Z\subset\mathbb{N}_{0}$ of $n$ with $h(n)\neq 0$ has upper Banach density [math]. Note that $R\setminus Z$ is then thick.

For $j\in\{1,\ldots,r\}$ , put $g_{j}^{\prime}(n)=\left\llbracket T^{n}z\in\operatorname{int}S_{j}\right\rrbracket$ and $f^{\prime}_{j}(n)=\left\llbracket f(n)=c_{j}\right\rrbracket$ . Then $g_{j}^{\prime}(n)=f_{j}^{\prime}(n)$ for $n\in R\setminus Z$ . By Lemma 2.4, this is only possible if for each $j$ , the set $\operatorname{int}S_{j}$ is either empty or dense. Since $\mu_{X}(X\setminus\bigcup_{j=1}^{r}\operatorname{int}S_{j})=0$ , there is $i$ such that $\operatorname{int}S_{i}$ is dense, and $\operatorname{int}S_{j}=\emptyset$ for $j\neq i$ . Denoting by $Z^{\prime}\supset Z$ the set of $n\in R$ with $T^{n}z\in X\setminus\operatorname{int}S_{i}$ we have $d^{*}(Z^{\prime})=0$ and $f(n)=g(n)=c_{i}$ for $n\in R\setminus Z^{\prime}$ , as needed. ∎

Proof of Theorem B.

This is a direct application of Theorem 2.6 with $f=g$ and $R=\mathbb{N}_{0}$ ∎

It is not a trivial matter to determine whether a given generalised polynomial is periodic away from a set of density [math], although it can be accomplished by the techniques in [BL07, Lei12]. In order to give explicit examples, we restrict ourselves to generalised polynomials of a specific form, which is somewhat more general than the one considered in Proposition 2.2.

Corollary 2.7.

Suppose that $q\colon\mathbb{Z}\to\mathbb{R}$ is a generalised polynomial with the property that $\lambda q(an)\bmod{1}$ is equidistributed in $[0,1)$ for any $\lambda\in\mathbb{Q}\setminus\{0\}$ and $a\in\mathbb{N}$ , and let $m\geq 2$ . Then the sequence $f(n)=\left\lfloor q(n)\right\rfloor\bmod{m}$ is not automatic.

Proof.

Suppose $f(n)$ were automatic. By Theorem B, there exist $a\in\mathbb{N}$ and $Z\subset\mathbb{N}_{0}$ with $d^{*}(Z)=0$ such that $f(an)$ is constant for $n\in\mathbb{N}_{0}\setminus Z$ . Hence, there is some $0\leq l<m$ such that $\frac{1}{m}q(an)\in\left[\frac{l}{m},\frac{l+1}{m}\right)$ for $n\in\mathbb{N}_{0}\setminus Z$ , contradicting the equidistribution assumption. ∎

The uniform distribution of generalised polynomials has been extensively studied by Håland-Knutson [Hål93, Hål94, HK95], and later a very general theory was developed by Bergelson and Leibman [BL07, Lei12]. In view of the the results in [Hål93], it is fair to say that a “generic” generalised polynomial $q(n)$ is equidistributed modulo $1$ . Hence, the assumptions on $q(n)$ in Corollary 2.7 are not overly restrictive.

To make the last remark precise, let us define the (multi)set of coefficients of a generalised polynomial $q$ as follows. If $q(n)=\sum_{j}\alpha_{j}n^{j}$ is a polynomial, then the coefficients of $q(n)$ are the non-zero terms among the $\alpha_{j}$ . If $q(n)=r_{1}(n)+r_{2}(n)$ or $q(n)=r_{1}(n)\cdot r_{2}(n)$ , then the coefficients of $q(n)$ are the union of the coefficients of $r_{1}(n)$ and $r_{2}(n)$ . Finally, if $q(n)=p(n)\left\lfloor r(n)\right\rfloor^{d}$ , then the coefficients of $q(n)$ are the union of the coefficients of $r(n)$ and the coefficients of $p(n)$ . The set of coefficients will depend on the choice of a representation of the generalised polynomial at hand; we fix one such choice. We cite a slightly simplified version of the main theorem of [Hål93].

Theorem 2.8.

Suppose that $q(n)$ is a generalised polynomials, and all of the products of subsets of the coefficients of $q(n)$ are $\mathbb{Q}$ -linearly independent. Then $q(n)$ is equidistributed modulo $1$ .

As an example of an application, we conclude that $\left\lfloor\sqrt{2}n\left\lfloor\sqrt{3}n\right\rfloor\right\rfloor\bmod{10}$ is not an automatic sequence.

3. Combinatorial structure of automatic sets

In this section, we begin the investigation of sparse sequences. Here, we call a sequence $f\colon\mathbb{N}_{0}\to\{0,1\}\subset\mathbb{R}$ sparse if it is the characteristic function of a set of density [math] (if such a sequence comes from a generalised polynomial or is automatic, it also has upper Banach density [math], cf. [BK16] and Lemma 4.8 below). Note that for such sparse sequences, Theorem B conveys no useful information. Conversely, to prove Conjecture A, it would suffice (in light of Theorem B) to verify it for sparse sequences; this observation will be made precise in the proof of Theorem C below.

Arid sets

To formulate our main result, it is convenient to introduce the following piece of terminology, inspired by Kedlaya [Ked06]. Such sets appear in the papers of Szilard–Yu–Zhang–Shallit [SYZS92], Gawrychowski–Krieger–Rampersad–Shallit [GKRS10], Derksen [Der07] and Adamczewski–Bell [AB08] (among many others) under different names (regular languages of polynomial growth/sparse/poly-slender/bounded) or without any name. A closely related class of sets known as $p$ -normal sets plays a significant rôle in the study of zero sets of linear recurrences in positive characteristic; see also [DM15, AB12]. Other related classes of sets include Saguaro sets of [AB08] and $F$ -sets of [MS02]. Since we will use the notation simultaneously for languages and for the associated sets of integers, and since some of the existing terminology might be confusing in our context, we have decided to use a different term.

Definition 3.1 (Arid sets).

Let $k\geq 2$ , $r\geq 0$ be integers. A basic $k$ -arid set (of rank $\leq r$ ) is a set of the form

[TABLE]

where $v_{0},\dots,v_{r}\in\Sigma_{k}^{*}$ and $w_{1},\dots,w_{r}\in\Sigma_{k}^{*}$ . A set $A\subset\Sigma_{k}^{*}$ is $k$ -arid (of rank $\leq r$ ) if it is a finite union of basic arid sets (of rank $\leq r$ ). If $k$ is clear from the context, we speak simply of (basic) arid sets.

We similarly define these notions for set of integers: A set $E\subset\mathbb{N}_{0}$ is $k$ -arid (of rank $\leq r$ ) if it has the form ${\left\{[u]_{k}\ \middle|\ u\in A\right\}}$ where $A\subset\Sigma_{k}^{*}$ is arid (of rank $\leq r$ ). A sequence $f\colon\mathbb{N}_{0}\to\{0,1\}$ is arid if the set ${\left\{n\in\mathbb{N}_{0}\ \middle|\ f(n)=1\right\}}$ is arid.

Using the Kleene star notation, the $k$ -arid set $A$ in (8) can be alternatively written as

[TABLE]

In the following, we will not use this notation, and rather use the former notation which seems more appropriate for our context.

Lemma 3.2.

Any $k$ -arid sequence is $k$ -automatic.

Proof.

It is clear that any $k$ -arid set is given by a regular expression and hence it is $k$ -automatic by Kleene’s theorem. Alternatively, in this simple case one can construct the required automata by hand. ∎

Cobham [Cob72] proved that there is a gap in the growth rate of automatic sets.

Proposition 3.3.

Let $E\subset\mathbb{N}_{0}$ be a non-empty automatic set. Then exactly one of the following two conditions holds:

(i)

There exists an integer $r\geq 0$ and a real number $c>0$ such that

[TABLE] 2. (ii)

There exists $\alpha>0$ such that

[TABLE]

Proof.

This follows from [Cob72, Theorem 11 & 12]∎

According to the theorem above, automatic sets have either poly-logarithmic or polynomial rate of growth. Szilard–Yu–Zhang–Shallit [SYZS92] showed that the class of automatic sets of poly-logarithmic growth coincides with the class of arid sets. To state a more precise version of this result, we recall that a state $s$ in a $k$ -automaton $\mathcal{A}=(S,\Sigma_{k},\delta,s_{0},\{0,1\},\tau)$ with output $\{0,1\}$ is called accessible if there exists $v\in\Sigma_{k}^{*}$ such that $\delta(s_{0},v)=s$ and is called $coaccessible$ is there exists $v\in\Sigma_{k}^{*}$ such that $\tau(\delta(s,v))=1$ .

Proposition 3.4.

Let $E\subset\mathbb{N}_{0}$ be a $k$ -automatic set and let $\mathcal{A}=(S,\Sigma_{k},\delta,s_{0},\{0,1\},\tau)$ be a $k$ -automaton with output $\{0,1\}$ that produces $E$ , in the sense that an integer $n$ is in $E$ if and only $\tau(\delta(s_{0},n))=1$ . Then the following conditions are equivalent:

(i)

The set $E$ is arid. 2. (ii)

There exists an integer $r$ such that $|E\cap[N]|=O(\log^{r}(N))$ . 3. (iii)

There does not exist an accessible and coaccessible state $s\in S$ and $v_{1},v_{2}\in\Sigma_{k}^{*}$ such that $v_{1}v_{2}\neq v_{2}v_{1}$ and $\delta(s,v_{1})=\delta(s,v_{2})=s$ .

Moreover, if $E$ is arid of rank $r$ , then the limit $\lim_{N\to\infty}|E\cap[N]|/\log^{r}(N)$ exists and is finite.

Proof.

This is essentially proved in [SYZS92]; our formulation is influenced by [BHS17, Lemmas 2.1–2.3] (for more details and related results see references therein). ∎

Remark.

Some similar results are also implicit in [AB08, Lemma 6.7] and [Der07, Proposition 7.9]; see also [Ked06].

Remark 3.5.

Let $a\geq 1$ be an integer. Then the notions of $k$ -arid sets and $k^{a}$ -arid sets coincide. This follows either from a direct argument or from Proposition 3.4. We will use this observation several times.

We will in fact need a slight improvement on the information on the rate of growth of arid sets from Proposition 3.4.

Lemma 3.6.

Let $E\subset\mathbb{N}_{0}$ be arid of rank (exactly) $r$ . Then

[TABLE]

Proof.

It suffices to deal with basic arid sets given by

[TABLE]

We begin with some standard reductions. Replacing $w_{i}$ with suitably chosen powers, altering $v_{i}$ accordingly, and passing to basic arid subsets, we may assume that all $w_{i}$ have the same length $a$ . Replacing $k$ with $k^{a}$ and using Remark 3.5 enables us to assume that $\left|w_{i}\right|=1$ for each $i$ . If $r$ is minimal, we further know that if $w_{i}=w_{i+1}$ for some $i$ , then $v_{i}$ is not a power of $w_{i}$ . Finally, we may assume that $N=k^{L}$ is a large power of $k$ , and that $M=k^{L}M^{\prime}$ is divisible by $N$ .

Since an element of $E\cap[M,M+N)$ is uniquely determined by its final $L$ digits, the bound $\left|E\cap[N]\right|\ll L^{r}$ follows immediately from counting the $r$ -tuples $(l_{1},\dots,l_{r})$ with $\sum_{i=1}^{r}l_{i}+\sum_{i=0}^{r}\left|v_{i}\right|\leq L$ . ∎

We are now ready to state the main theorem of this section in a more convenient language.

Theorem 3.7.

Suppose that a sparse set $E\subset\mathbb{N}_{0}$ is simultaneously $k$ -automatic and generalised polynomial. Then $E$ is $k$ -arid.

For the proof of this result, we need to use the notion of IPS sets introduced in [BK16].

IPS sets and automatic sequences

The following notion generalises the classical notion of an $\operatorname{IP}$ set that is of importance in combinatorial number theory and ergodic theory (for origin of the term $\operatorname{IP}$ , which stands either for infinite-dimensional parallelepiped or idempotent, see, e.g., [BL16]). This notion is discussed in more detail in [BK16] (in particular, an equivalent definition of $\mathrm{IPS}$ sets in terms of ultrafilters is given there).

Definition 3.8 ( $\operatorname{IP}$ and $\mathrm{IPS}$ sets).

For a sequence $(n_{i})_{i\in\mathbb{N}}\subset\mathbb{N}$ , the corresponding set of finite sums is

[TABLE]

where $n_{\alpha}=\sum_{i\in\alpha}n_{i}$ . Any set containing a set of the form $\operatorname{FS}(n_{i};N_{t})$ for some $(n_{i}),\ (N_{t})$ is called an $\mathrm{IPS}$ set.

For a sequence $(n_{i})_{i\in\mathbb{N}}\subset\mathbb{N}$ and shifts $(N_{t})_{t\geq 1}\subset\mathbb{N}_{0}$ , the corresponding set of shifted finite sums is

[TABLE]

where again $n_{\alpha}=\sum_{i\in\alpha}n_{i}$ . Any set containing a set of the form $\operatorname{FS}(n_{i};N_{t})$ for some $(n_{i}),\ (N_{t})$ is called an $\mathrm{IPS}$ set.

Example 3.9.

Fix $k\geq 2$ . Let $v_{1},v_{2}\in\Sigma_{k}^{*}$ be two distinct words with $\left|v_{1}\right|=\left|v_{2}\right|=l$ , and let $u_{0},u_{1}\in\Sigma_{k}^{*}$ be arbitrary. Consider the set

[TABLE]

Then $E$ is an $\mathrm{IPS}$ set. Indeed, $E=\operatorname{FS}(n_{i};N_{t})$ , where $N_{t}=[u_{0}v_{1}^{t}u_{1}]_{k}$ and $n_{i}$ = $([v_{2}]_{k}-[v_{1}]_{k})k^{(i-1)l+\left|u_{1}\right|}$ (assuming, as we may, that $[v_{2}]_{k}>[v_{1}]_{k}$ ). If $[u_{0}]_{k}=[u_{1}]_{k}=[v_{1}]_{k}=0$ , then $E$ is an $\operatorname{IP}$ set.

$\mathrm{IPS}$ sets occur in our work due to the following result.

Theorem 3.10.

Let $E\subset\mathbb{N}_{0}$ be an automatic set. Then either $E$ is arid or it is $\mathrm{IPS}$ .

Proof.

Assume that $E\subset\mathbb{N}_{0}$ is automatic but not arid; we need to show that $E$ is $\mathrm{IPS}$ . Let $\mathcal{A}=(S,\Sigma_{k},\delta,s_{0},\{0,1\},\tau)$ be a $k$ -automaton with output $\{0,1\}$ which produces the characteristic sequence of $E$ when reading digits starting from the most significant one and ignoring the initial zeros.

Since $E$ is not arid, neither is the set $A={\left\{w\in\Sigma^{*}_{k}\ \middle|\ \tau(\delta(s_{0},w))=1\right\}}$ . Hence, by Proposition 3.4, there exists an accessible and coaccessible state $s\in S$ and $v_{1},v_{2}\in\Sigma_{k}^{*}$ such that $v_{1}v_{2}\neq v_{2}v_{1}$ and $\delta(s,v_{1})=\delta(s,v_{2})=s$ . Replacing $v_{1}$ and $v_{2}$ by their powers and interchanging them if necessary, we may assume that $v_{1}$ and $v_{2}$ are of equal length $l=\left|v_{1}\right|=\left|v_{2}\right|$ and $[v_{1}]_{k}<[v_{2}]_{k}$ . Pick $u_{0},u_{1}\in\Sigma_{k}^{*}$ so that $s=\delta(s_{0},u_{0})$ , and $\tau(\delta(s,u_{1}))=1$ .

The set $A$ contains all words of the form $w=u_{0}v_{j_{1}}v_{j_{2}}\dots v_{j_{t}}u_{1}$ , where ${j_{i}}\in\{1,2\}$ and $t\in\mathbb{N}_{0}$ . It follows that $A$ is $\mathrm{IPS}$ (cf. Example 3.9). ∎

In order to prove Theorem 3.7, we need to recall one of the main results of [BK16] (Theorem A), whose proof uses ergodic theory and the machinery of ultrafilters.

Theorem 3.11.

Let $E\subset\mathbb{Z}$ be a sparse generalised polynomial set. Then $E$ is not $\mathrm{IPS}$ .

Theorem 3.7 and Theorem C now follow quite easily.

Proof of Theorem 3.7.

Let $E$ be the set in Theorem 3.7. By Theorem 3.11, $E$ is not $\mathrm{IPS}$ . Hence, by Theorem 3.10, it is arid. ∎

Proof of Theorem C.

Suppose that $f\colon\mathbb{N}_{0}\to\mathbb{R}$ is automatic and generalised polynomial. Let $b(n)$ be the periodic function such that the set $Z={\left\{n\in\mathbb{N}_{0}\ \middle|\ f(n)\neq b(n)\right\}}$ has $d^{*}(Z)=0$ . (The existence of $b(n)$ is guaranteed by Theorem B.)

Note that $Z$ is generalised polynomial and automatic (automaticity is clear; to see that $Z$ is generalised polynomial, compose $f-b$ with a polynomial $p$ such that $p(0)=1$ and $p(x-y)=0$ for $x\in f(\mathbb{N}_{0})$ , $y\in b(\mathbb{N}_{0})$ , $x\neq y$ ).

By Theorem 3.7, $Z$ is arid. Hence, by Lemma 3.6 below, we have

[TABLE]

for some $r\in\mathbb{N}_{0}$ as $N\to\infty$ . ∎

If Conjecture A is true then there are no nontrivial examples of arid generalised polynomial sets (indeed, by Theorem D non-existence of such sets is precisely equivalent to Conjecture A; see also Proposition 5.3). However, there are examples of generalised polynomial sets which exhibit some properties reminiscent of arid sets. We have already mentioned in this context that the set of Fibonacci numbers is a generalised polynomial set, and in [BK16, Theorems B & C] we have extended this to certain linear recurrences of order $2$ and $3$ as well as arbitrary sets whose size grows at a sublogarithmic rate.

It is important to note that in the statement of Theorem 3.10 it is not possible to replace $\mathrm{IPS}$ sets with $\operatorname{IP}$ sets or their translates (cf. Example 4.3). We discuss this question further in the next section.

4. Examples and properties of automatic sets

$\mathcal{B}$ -free sets

In this subsection, we will discuss a simple class of examples of automatic sets, the $\mathcal{B}$ -free sets, which will allow us to show that in the statement of Theorem 3.10 it is in general not possible to replace $\mathrm{IPS}$ sets with translates of $\operatorname{IP}$ sets (Example 4.3).

Example 4.1.

Let $k\geq 2$ , and let $\mathcal{B}\subset\Sigma_{k}^{*}$ be a finite set of ‘prohibited’ words of length $\leq t$ . A word $u\in\Sigma_{k}^{*}$ is $\mathcal{B}$ -free if $u$ contains no $b\in\mathcal{B}$ as a factor. Accordingly, $n\in\mathbb{N}_{0}$ is $\mathcal{B}$ -free if its base- $k$ expansion $(n)_{k}$ is $\mathcal{B}$ -free. Denote the set of $\mathcal{B}$ -free integers by $F_{\mathcal{B}}$ .

(i)

The set $F_{\mathcal{B}}$ is $k$ -automatic. 2. (ii)

If $\mathcal{B}\neq\emptyset$ , then $F_{\mathcal{B}}$ is sparse. 3. (iii)

If $\sum_{b\in\mathcal{B}}k^{-\left|b\right|}\leq\frac{1}{16t}$ , then $F_{\mathcal{B}}$ is not arid. 4. (iv)

If each $b\in\mathcal{B}$ contains at least two non-zero digits, then $F_{\mathcal{B}}$ is $\operatorname{IP}$ . 5. (v)

If some $b\in\mathcal{B}$ consists only of [math]’s, then $F_{\mathcal{B}}-m$ is not $\operatorname{IP}$ for any $m\in\mathbb{Z}$ .

Proof.

(i)

It is not difficult to explicitly describe a $k$ -automaton which computes the characteristic function of $F_{\mathcal{B}}$ ; alternatively, the claim follows immediately from Kleene’s theorem. 2. (ii)

We may assume that $\mathcal{B}$ consists of a single string of length $t$ . Then the probability that a randomly chosen word of length $m$ does not contain $b$ is at most $\left(1-{k^{-t}}\right)^{\left\lfloor m/t\right\rfloor}$ . The claim easily follows from this. 3. (iii)

We may assume $\mathcal{B}\neq\emptyset$ . Construct an undirected graph $G=(V,E)$ (we allow $G$ to have loops), where $V=\Sigma_{k}^{t}$ , and $\{u,v\}\in E$ if $uv$ and $vu$ are both $\mathcal{B}$ -free. If $u_{1},u_{2},\dots,u_{r}$ is a walk in $G$ , then $u_{1}u_{2}\cdots u_{r}$ is $\mathcal{B}$ -free. Assume that $G$ contains a walk $u_{1},w,u_{2}$ of length $2$ with $u_{1}\neq u_{2}$ . With loss of generality, we may assume that $u_{1}\neq 0^{t}$ (otherwise, switch $u_{1}$ and $u_{2}$ ). Then for any $i_{1},\dots,i_{r}\in\{1,2\}$ the word $v=u_{1}wu_{i_{1}}wu_{i_{2}}w\cdots u_{i_{r}}w$ is $\mathcal{B}$ -free. Hence, $[v]_{k}\in F_{\mathcal{B}}$ and we can see either directly or from Proposition 3.4 that $F_{\mathcal{B}}$ is not arid. Thus, it remains to check that $G$ contains a length $2$ walk with distinct endpoints; for the sake of contradiction suppose that this is not the case.

Since each vertex has at most one neighbour (including itself if $\{u,u\}$ is an edge), the graph is a disjoint union of paths of length $1$ , loops, and vertices, and hence $\left|E\right|\leq\left|V\right|=k^{t}$ . On the other hand, given $b\in\mathcal{B}$ , the number of pairs $(u,v)\in V^{2}$ such that $b$ appears in $uv$ or $vu$ is $<4tk^{2t-\left|b\right|}$ , so

[TABLE]

(note that the assumption implies that $k^{t}\geq 16$ ), which gives a contradiction. 4. (iv)

Let $n_{i}=k^{it}$ . Then $\operatorname{FS}(n_{i})\subset F_{\mathcal{B}}$ . 5. (v)

Suppose that $F_{\mathcal{B}}$ contains $E+m$ for some $\operatorname{IP}$ set $E$ and integer $m$ . Replacing $E$ with a smaller $\operatorname{IP}$ set if necessary, we may assume that $m>0$ . Since $E$ is $\operatorname{IP}$ , for any $l\geq 0$ there exists $n\in E$ which is divisible by $k^{l}$ . If $l$ is large enough (it suffices that $l>t+\left\lfloor\log N/\log k\right\rfloor$ ) then $n+m$ is an element of $F_{\mathcal{B}}$ whose base- $k$ expansion contains $t$ consecutive zeros, contradicting the assumption on $\mathcal{B}$ .∎

Remark 4.2.

A similar example was considered by Miller [Mil12], who gave sufficient conditions for $F_{\mathcal{B}}$ to be infinite.

Example 4.3.

The set

[TABLE]

is $2$ -automatic, sparse, not arid, and does not contain a translate of an $\operatorname{IP}$ set.

Proof.

We see that $F_{00}$ is not arid by Proposition 3.4 or by a simple modification of the proof of 4.1.(iii). The remaining claims follow directly from Example 4.1.∎

The following two examples can be verified similarly.

Example 4.4.

The set

[TABLE]

is $2$ -automatic, sparse, not arid, and $\operatorname{IP}$ .

Example 4.5.

The Baum–Sweet sequence ([BS76]) given by

[TABLE]

It takes the value $1$ on a set which is $2$ -automatic, sparse, not arid, and $\operatorname{IP}$ .

Translates of $\operatorname{IP}$ sets

Even though in general non-arid automatic sets need not contain translates of $\operatorname{IP}$ sets, this is nevertheless the case under certain stronger assumptions on the set.

Proposition 4.6.

Let $E\subset\mathbb{N}_{0}$ be a $k$ -automatic set. Assume that for every $w\in\Sigma_{k}^{*}$ there is an integer $n\in E$ such that $w$ is a factor of $(n)_{k}$ . Then the set $E-m={\left\{n-m\ \middle|\ n\in E\right\}}$ is $\operatorname{IP}$ for some $m\in\mathbb{N}_{0}$ .

Proof of Proposition 4.6.

Let $\mathcal{A}=(S,\Sigma_{k},\delta,s_{0},\{0,1\},\tau)$ be a $k$ -automaton that produces the characteristic sequence of $E$ by reading the digits of $n$ starting with the least significant one, allowing for leading zeros. We will denote the word $0\cdots 0\in\Sigma_{k}^{*}$ with $n$ zeros by $0^{n}$ . We begin by proving the following claim.

Claim.

There exist states $s,s^{\prime}\in S$ with $\tau(s)=1$ , an integer $l\in\mathbb{N}$ , and a word $v\in\Sigma_{k}^{l}$ that is not a power of [math] such that for $z=0^{l}$ we have $\delta(s,z)=s^{\prime}$ , $\delta(s,v)=s$ , $\delta(s^{\prime},z)=s^{\prime}$ , $\delta(s^{\prime},v)=s$ . This is portrayed below:

$s$$s^{\prime}$$v$$z$$v$$z$

Proof of the claim.

Let $n=|S|$ be the number of states in $\mathcal{A}$ . We first show a weaker statement, namely that there is a state $s$ with $\tau(s)=1$ such that if $\tilde{s}=\delta(s,0^{n})$ denotes the state reached from $s$ after reading $n$ zeros, then we can return from $\tilde{s}$ to $s$ along a path not consisting only of zeros, that is $\delta(\tilde{s},\tilde{v})=s$ for some $\tilde{v}\in\Sigma_{k}^{*}$ that is not a power of [math].

To prove this, we construct a word $w=w_{1}w_{2}\cdots w_{n^{2}}$ as follows. Enumerate all pairs in $S\times S$ as $(s_{i},s_{i}^{\prime})$ for $1\leq i\leq n^{2}$ . In the first step, if $\tilde{s}_{1}$ is reachable from $s_{1}$ , let $w_{1}$ describe any path between the two, so that $\delta(s_{1},w_{1})=s_{1}^{\prime}$ ; otherwise, let $w_{1}=\epsilon$ . In general, if $w_{1},\dots,w_{i-1}$ have been defined, choose $w_{i}$ so that $\delta(s_{i},w_{1}w_{2}\cdots w_{i-1}w_{i})=s_{i}^{\prime}$ if possible (i.e., if $s_{i}^{\prime}$ is reachable from $\delta(s_{i},w_{1}w_{2}\cdots w_{i-1})$ ), and $w_{i}=\epsilon$ otherwise.

By the assumption on the set $E$ , there exists some $x,y\in\Sigma_{k}^{*}$ such that for $s=\delta(s_{0},xwy)$ we have $\tau(s)=1$ . Applying the same assumption with $w1$ in place of $w$ , we may ensure that $y$ is not a power of [math]. It remains to show that we can return from $\tilde{s}=\delta(s,0^{n})$ to $s$ . For $0\leq i\leq n^{2}$ , let $r_{i}=\delta(s_{0},xw_{1}w_{2}\cdots w_{i})$ denote the intermediate states on the path from $s_{0}$ to $s$ labelled $xwy$ , in particular $r_{0}=\delta(s_{0},x)$ . The construction of $w$ is arranged so that for any $i$ with $s_{i}=r_{0}$ , we have $r_{i}=\delta(r_{i-1},w_{i})=\tilde{s}_{i}$ , provided that $s_{i}^{\prime}$ is reachable from $r_{i-1}$ .

Choose $1\leq j\leq n^{2}$ such that $s_{j}=r_{0}$ and $s_{j}^{\prime}=\tilde{s}$ . Since $s$ is reachable from $r_{j-1}$ and $\tilde{s}$ is reachable from $s$ , $\tilde{s}$ is reachable from $r_{j-1}$ . Hence, the construction of $w$ guarantees that $r_{j}=\delta(r_{j-1},w_{j})=\tilde{s}$ . In particular, $\delta(\tilde{s},\tilde{v})=s$ , where $\tilde{v}=w_{j+1}\dots w_{n^{2}}y$ . Note that $\tilde{v}$ is not a power of [math] since neither is $y$ . This proves the weaker version of the claim.

To prove the stronger statement, note first that since $S$ has only $n$ states, there exist $0\leq i<j\leq n$ such that $\delta(s,0^{i})=\delta(s,0^{j})$ . Let $m>i$ be any integer divisible by $(j-i)$ and put $s^{\prime}=\delta(s,0^{m})$ . Since $m$ is divisible by $(j-i)$ , we have $s^{\prime}=\delta(s^{\prime},0^{m})$ . Because $\tilde{s}$ is reachable from $s^{\prime}$ (actually, $\delta(s^{\prime},0^{n})=\tilde{s}$ ), there is a word $u$ (equal to $0^{n}v$ , hence not a power of [math]) such that $\delta(s^{\prime},u)=s$ . Take $v=(0^{m}u)^{m}$ and $l=m(\left|u\right|+m)$ . The states $s,s^{\prime}$ and the word $v$ (of length $l$ ) satisfy all the required conditions, namely $\delta(s,0^{l})=s^{\prime}$ , $\delta(s,v)=s$ , $\delta(s^{\prime},0^{l})=s^{\prime}$ , $\delta(s^{\prime},v)=s$ , and $\tau(s)=1$ . ∎

To finish the proof of Proposition 4.6, we may assume that all states in $\mathcal{A}$ are accessible. Choose states $s$ and $s^{\prime}$ and words $v$ and $z=0^{l}$ as in the statement of the claim. Let $u\in\Sigma_{k}^{*}$ be such that $\delta(s_{0},u)=s$ . For any word $w=uv_{1}v_{2}\cdots v_{r}$ , where $v_{i}\in\{v,z\}$ for $1\leq i<r$ and $v_{r}=v$ , we have $\delta(s_{0},w)=s$ , whence $[w^{\mathrm{R}}]_{k}\in E$ . It follows that $E$ contains $\operatorname{FS}(n_{i};N)$ , where $N=[u^{\mathrm{R}}]_{k}$ and $n_{i}=k^{(i-1)l+\left|u\right|}[v^{\mathrm{R}}]_{k}$ , $i\in\mathbb{N}$ . ∎

Proposition 4.6 has the following amusing application which, however, does not require the full strength of Theorem 3.11. (Similar results can be shown in greater generality.)

Example 4.7.

There exists a constant $c>0$ such that for any sequence $\varepsilon(n)$ which is a rational power of a generalised polynomial such that $\varepsilon(n)\ll n^{-c}$ as $n\to\infty$ , the set

[TABLE]

is not automatic.

Proof.

It is shown in [BK16, Propositions 4.6 & 4.8] that $E$ is generalised polynomial, $E$ contains no translate of an $\operatorname{IP}$ set, and that $E\cap(a\mathbb{N}+b)\neq\emptyset$ for any $a\in\mathbb{N}$ , $b\in\mathbb{N}_{0}$ .

Suppose that $E$ were $k$ -automatic. Since $E$ intersects nontrivially any arithmetic progression, it would satisfy the assumptions of Proposition 4.6, and thus would contain a translate of an $\operatorname{IP}$ set, contradicting the previously mentioned results. ∎

Densities of symbols

In this subsection, we prove a lemma on densities of occurrences of symbols in automatic sequences. As a corollary, we obtain the claim that sparse automatic sequences take non-zero value at a set of Banach density [math].

The density of symbols for an automatic sequence is often uniform. A set $E\subset\mathbb{N}_{0}$ has uniform density $d=d(E)$ if $\left|E\cap[M,M+N)\right|/N\to d$ as $N\to\infty$ uniformly in $M$ . For an automaton $\mathcal{A}=(S,\Sigma_{k},\delta,s_{0},\Omega,\tau)$ , a strongly connected component is an automaton $\mathcal{A}^{\prime}=(S^{\prime},\Sigma_{k},\delta^{\prime},s^{\prime}_{0},\Omega,\tau^{\prime})$ , where $S^{\prime}\subset S$ is non-empty, preserved under $\delta(\cdot,j)$ for all $j\in\Sigma_{k}$ and minimal with respect to these properties, $s^{\prime}_{0}\in S^{\prime}$ , and $\delta^{\prime},\ \tau^{\prime}$ are the restrictions of $\delta$ and $\tau$ to $S^{\prime}$ , respectively.

Lemma 4.8.

Let $a\colon\mathbb{N}_{0}\to\Omega$ be a $k$ -automatic sequence generated by an automaton $\mathcal{A}=(S,\Sigma_{k},\delta,s_{0},\Omega,\tau)$ reading input starting with the most significant digit, ignoring the initial zeros, and such that all the states are accessible. For $y\in\Omega$ , let $\rho_{y}\geq 0$ . Then the following conditions are equivalent:

(i)

For any $y\in\Omega$ , the set ${\left\{n\in\mathbb{N}_{0}\ \middle|\ a(n)=y\right\}}$ has density $\rho_{y}$ ; 2. (ii)

For any $y\in\Omega$ , the set ${\left\{n\in\mathbb{N}_{0}\ \middle|\ a(n)=y\right\}}$ has uniform density $\rho_{y}$ ; 3. (iii)

For any sequence $\tilde{a}^{\prime}\colon\Sigma_{k}^{*}\to\Omega$ produced by a strongly connected component $\mathcal{A}^{\prime}$ of $\mathcal{A}$ and for any $y\in\Omega$ we have

[TABLE]

Proof.

It is clear that (ii) implies (i). We will show that (i) implies (iii) and (iii) implies (ii). Throughout, it will be convenient to assume that $\Omega=\{0,1\}$ , which we may do without loss of generality. We then write $\rho$ for $\rho_{1}$ .

Suppose that (i) holds, and take some $\tilde{a}^{\prime}$ as in (iii). There is some $v\in\Sigma_{k}^{*}$ such that $\tilde{a}^{\prime}(u)=a([vu]_{k})$ , whence

[TABLE]

as $L\to\infty$ .

Now suppose that (iii) holds. For any $N,M$ and $L$ , we have

[TABLE]

uniformly in $M$ . For any $m\in\mathbb{N}_{0}$ , consider the sequence $\tilde{a}^{\prime}_{m}\colon\Sigma_{k}^{*}\to\Omega$ given by $\tilde{a}^{\prime}_{m}(u)=a([(m)_{k}u]_{k})$ , so that if $u\in\Sigma_{k}^{L}$ , then $a(mk^{L}+[u]_{k})=\tilde{a}^{\prime}_{m}(u)$ . Note that $\tilde{a}^{\prime}_{m}$ is produced by the automaton $\mathcal{A}^{\prime}$ that is obtained from $\mathcal{A}$ by changing the initial state to $s^{\prime}_{0}=\delta(s_{0},(m)_{k})$ .

If $\delta(s_{0},(m)_{k})$ lies in a strongly connected component of $\mathcal{A}$ , then we may use (iii) to estimate the inner sums in (13):

[TABLE]

as $L\to\infty$ (where the error term is uniform with respect to $m$ , since there are only finitely many possible sequences $\tilde{a}^{\prime}_{m}$ ). It is an easy exercise to check that the set of $m\in\mathbb{N}_{0}$ such that $\delta(s_{0},(m)_{k})$ does not lie in a strongly connected component of $\mathcal{A}$ has upper Banach density [math]. Estimating the inner sums in (13) corresponding to such $m$ trivially by $O(k^{L})$ , and letting $L\to\infty$ slowly enough so that $k^{L}/N\to 0$ , we conclude that

[TABLE]

as $N\to\infty$ uniformly in $M$ . Hence, (ii) holds. ∎

Linear recurrence sequences

We have already noted that the set of values of a linear recurrence sequence can be a generalised polynomial set. This is the case for the Fibonacci sequence; for more information, see [BK16, Theorem B]. In contrast, we show that the set of values of a linear recurrence sequence is not automatic, except for trivial examples. In the proof, we apply Theorem 3.10.

Proposition 4.9.

Let $(a_{m})_{m\geq 0}$ be an $\mathbb{N}$ -valued sequence satisfying a linear recurrence of the form

[TABLE]

with integer coefficients $c_{i}$ . Suppose that for some $k$ the set $E={\left\{a_{m}\ \middle|\ m\in\mathbb{N}_{0}\right\}}$ is $k$ -automatic. Then $E$ is a finite union of the following standard sets: linear progressions ${\left\{am+b\ \middle|\ m\in\mathbb{N}_{0}\right\}}$ with $a,b\in\mathbb{N}_{0}$ ; exponential progressions ${\left\{ak^{tm}+b\ \middle|\ m\in\mathbb{N}_{0}\right\}}$ with $a,b\in\mathbb{Q}$ and $t\in\mathbb{N}$ ; and finite sets.

Proof.

We first claim that there exists a representation of $E$ as a finite union

[TABLE]

where $F$ is finite, $L_{i}={\left\{a_{i}m+b_{i}\ \middle|\ m\in\mathbb{N}_{0}\right\}}$ are arithmetic progressions, $P_{i}={\left\{p_{i}(m)\ \middle|\ m\in\mathbb{N}_{0}\right\}}$ are value sets of polynomials $p_{i}(x)\in\mathbb{Z}[x]$ with $\deg p_{i}\geq 2$ , and $E_{i}$ have exponential growth in the sense that $\left|E_{i}\cap[N]\right|\ll\log N$ .

In order to prove this claim, we begin by noting that any restriction of $(a_{m})$ to an arithmetic progression $a^{(h,r)}_{m}=a_{hm+r}$ obeys some (minimal length) linear recurrence

[TABLE]

with $n^{\prime}=n^{\prime}(h,r)\leq n$ . Moreover, there exists a choice of $h$ such that each of that each $a^{(h,r)}_{m}$ is either identically zero or non-degenerate, in the sense that the associated characteristic polynomial $q^{(h,r)}(x)=x^{n^{\prime}}-\sum_{i=1}^{n^{\prime}}c^{(h,r)}_{i}x^{n^{\prime}-i}$ has no pair of roots $\lambda,\mu\in\mathbb{C}$ such that $\lambda/\mu$ is a root of unity (see, e.g., [EvdPSW03, Theorem 1.2] for a much stronger statement). Hence, for the purpose of showing the existence of a representation of the form (15), we may assume that $(a_{m})$ is non-degenerate. Suppose also that $n$ is minimal, and let $\lambda_{1},\dots,\lambda_{r}$ be the roots of $q(x)=x^{n}-\sum_{i=1}^{n}c_{i}x^{n-i}$ with $\left|\lambda_{1}\right|\geq\left|\lambda_{2}\right|\geq\dots$ . Note that either $E$ is finite or $\left|\lambda_{1}\right|\geq 1$ .

If $\left|\lambda_{1}\right|>1$ , then by the result of Evertse [Eve84] and van der Poorten and Schlickewei [vdPS91] (see [EvdPSW03, Theorem 2.3]), we have $a_{m}=\left|\lambda_{1}\right|^{m+o(m)}$ as $m\to\infty$ . Hence, $E$ has exponential growth, and we are done.

Otherwise, if $\left|\lambda_{1}\right|=1$ , then for all $j$ we have $\left|\lambda_{j}\right|=1$ or $\lambda_{j}=0$ . Kronecker’s theorem [Kro57] (or a standard Galois theory argument) shows that if $\lambda$ is an algebraic integer all of whose conjugates have absolute value $1$ , then $\lambda$ is a root of unity. Using the general formula for the solution of a linear recurrence, we may write for sufficiently large $m$

[TABLE]

where $p_{j}(x)$ are polynomials and $b_{j}(m)$ are periodic. Splitting $\mathbb{N}_{0}$ into arithmetic progressions where $b_{j}(m)$ are constant, we conclude that $E$ is a finite union of value sets of polynomials. This again produces a representation of the form (15).

Such a representation is not unique. Splitting $P_{i}$ into a finite number of subprogressions and discarding those which are redundant, we may assume that $P_{i}\cap L_{j}=\emptyset$ for any $i,j$ . Likewise, we may assume that $E_{i}\cap L_{j}=F\cap L_{j}=\emptyset$ for any $i,j$ . Fix one such representation subject to these restrictions. The set

[TABLE]

is again $k$ -automatic; it will suffice to show that $E^{\prime}$ is a union of the standard sets mentioned above.

We claim that $K_{\text{poly}}=0$ , i.e., the representation of $E$ uses no polynomial progressions of degree $\geq 2$ . Suppose for the sake of contradiction that $P={\left\{p(m)\ \middle|\ m\in\mathbb{N}_{0}\right\}}$ appears in one of the sets $P_{i}$ , and write $p(m)=\sum_{i=0}^{d}c_{i}m^{i}$ , where $c_{i}\in\mathbb{Z}$ . Replacing $p(m)$ with $p(m+r)$ for a suitably chosen $r\in\mathbb{N}_{0}$ , we may assume that $c_{i}>0$ for $0\leq i\leq d$ . For sufficiently large $t$ , we have $p(k^{t})=[u_{d}0^{t-t_{0}}u_{d-1}0^{t-t_{0}}u_{d-2}\cdots u_{1}0^{t-t_{0}}u_{0}]_{k}$ , where $t_{0}$ is a constant and $u_{i}$ is the base- $k$ expansion of $c_{i}$ , padded by [math]’s so as to have $\left|u_{i}\right|=t_{0}$ . Since $p(k^{t})\in E^{\prime}$ , from the pumping lemma 1.9 it follows that there is $l\in\mathbb{N}$ such that for any $s_{1},\dots,s_{d}\in\mathbb{N}$ it holds that

[TABLE]

For sufficiently large $S$ and a small absolute constant $\delta$ to be determined later, consider the set

[TABLE]

and put $N(S):=n(1,\dots,1,S-d+1)=\min Q(S)$ (for large $S$ ). Note that $N(S)=k^{lS+O(1)}$ and that $\max Q(S)=N(S)+O(N(S)^{\delta})$ . For a fixed $T_{0}$ and $T\to\infty$ , we shall consider the cardinality of the set $Q(T_{0},T)=\bigcup_{T_{0}\leq S\leq T}Q(S)$ . By an elementary counting argument, we find

[TABLE]

To obtain an upper bound, we separately estimate $\left|Q(S)\cap P_{i}\right|$ and $\left|Q(T_{0},T)\cap E_{j}\right|$ for each $i,j$ .

Suppose that $n,n^{\prime}\in Q(S)\cap P_{i}$ with $n^{\prime}>n$ , so in particular $n=p_{i}(m)$ and $n^{\prime}=p_{i}(m^{\prime})$ for some $m,m^{\prime}\gg N(S)^{1/\deg p_{i}}$ . We then have the chain of inequalities:

[TABLE]

which is a contradiction for sufficiently large $S$ , provided that $\delta<\frac{\deg p_{i}-1}{\deg p_{i}}$ (which will hold if we put $\delta=\frac{1}{3}$ ). Thus, $\left|Q(S)\cap P_{i}\right|\leq 1$ .

As for $Q(T_{0},T)\cap E_{j}$ , from the bounds on growth of $E_{j}$ we immediately have

[TABLE]

In total, using (16) and (17) we find that

[TABLE]

contradicting the previously obtained bound $\left|Q(T_{0},T)\right|\gg T^{2}$ . It follows that indeed $K_{\text{poly}}=0$ .

Since $E^{\prime}$ contains no polynomial or linear progressions, we have $\left|E^{\prime}\cap[N]\right|\ll\log N$ . It follows from Proposition 3.4 that $E^{\prime}$ must be $k$ -arid of rank $1$ . Since all basic arid sets of rank $1$ are of the form described in the statement of the theorem, we are done. ∎

5. Proof of Theorem D

In this section, we derive Theorem D from Theorem C. Our argument is purely combinatorial and can be entirely phrased in terms of finite automata with no further recourse to dynamics.

Proposition 5.1.

Let $A\subset\Sigma_{k}^{*}$ be an infinite arid set. Then there exists $v\in\Sigma_{k}^{*}$ such that $A\cap v\Sigma_{k}^{*}$ takes the form

[TABLE]

where $p\geq 1$ , $v,w,u_{i}\in\Sigma_{k}^{*}$ and $w\neq\epsilon$ . In particular, $A\cap v\Sigma_{k}^{*}$ is arid of rank $1$ .

Likewise, there exists $\tilde{v}\in\Sigma_{k}^{*}$ such that $A\cap\Sigma_{k}^{*}\tilde{u}$ takes the form

[TABLE]

where $\tilde{p}\geq 1$ , $\tilde{v}_{i},\tilde{w},\tilde{u}\in\Sigma_{k}^{*}$ and $\tilde{w}\neq\epsilon$ .

Proof.

Since the notion of an arid set is preserved under the reversal operation, it is sufficient to prove the former statement. For $B\subset\Sigma_{k}^{*}$ and $v\in\Sigma_{k}^{*}$ , put $v^{-1}B={\left\{u\in\Sigma_{k}^{*}\ \middle|\ vu\in B\right\}}$ . If $B$ is arid of rank $\leq r$ , then so is $v^{-1}B$ .

Claim.

Let $B\subset\Sigma_{k}^{*}$ be arid of rank $r$ , and let $x_{1},x_{2},y\in\Sigma_{k}^{*}$ be such that $a=\left|y\right|=\left|x_{2}\right|$ and $y\neq x_{2}$ . Then for sufficiently large $m$ (depending on $B,x_{1},x_{2},y$ ), $(x_{1}y^{m}x_{2})^{-1}B$ is arid of rank $\leq(r-1)$ .

Proof.

Replacing $B$ with $x_{1}^{-1}B$ , we may assume that $x_{1}=\epsilon$ .

Let $a=|y|=|x_{2}|$ . In analogy with Remark 3.5, note that there is a natural way to identify $\Sigma_{k^{a}}^{*}$ with a subset of $\Sigma_{k}^{*}$ , and any arid set $B\subset\Sigma_{k}^{*}$ is a finite union of translates $B_{i}v_{i}$ with $v_{i}\in\Sigma_{k}^{*}$ of arid sets $B_{i}\subset\Sigma_{k^{a}}^{*}$ . Hence, it will suffice to show that if $B\subset\Sigma_{k^{a}}^{*}$ is arid of rank $r$ , then for sufficiently large $m$ , $B\cap y^{m}x_{2}\Sigma_{k}^{*}$ is arid of rank $\leq(r-1)$ . We may now replace $k$ with $k^{a}$ and assume that $\left|y\right|=\left|x_{2}\right|=1$ .

It will suffice to prove the claim for $B$ of the form

[TABLE]

where $w_{i}\neq\epsilon$ for all $i$ (note that $l_{i}$ here are required to be strictly positive; any arid set of rank $r$ is a union of such sets and an arid set of rank $\leq(r-1)$ ). Now, if $m>\left|v_{0}w_{1}\right|$ then either $B\cap y^{m}x_{2}\Sigma_{k}^{*}=\emptyset$ (in which case we are trivially done) or $B\cap y^{m}x_{2}\Sigma_{k}^{*}\neq\emptyset$ and both $v_{0}$ and $w_{1}$ is a power of $y$ . In the latter case, we further conclude that $x_{2}$ appears in $v_{1}w_{2}$ (else $B$ would have rank $\leq(r-1)$ ), which is necessarily of the form $y^{b}x_{2}v_{1}^{\prime}$ with $b\in\mathbb{N}_{0}$ . Hence

[TABLE]

is arid of rank $\leq(r-1)$ . ∎

The proof of the proposition is now a simple induction on the rank $r$ of $A$ . Since $A$ is infinite, we have $r\geq 1$ .

If $r=1$ , then $A$ takes the form $\bigcup_{i=1}^{r}{\left\{v_{i}w^{l}_{i}u_{i}\ \middle|\ l\in\mathbb{N}_{0}\right\}}$ , where $w_{i}\neq\epsilon$ for at least one $i$ , say $i=1$ . Then $A\cap v\Sigma_{k}^{*}$ takes the required form for $v=v_{1}w_{1}^{m}$ for $m$ large enough.

If $r>1$ , then we may find a rank $2$ basic arid set

[TABLE]

contained in $A$ . Without loss of generality, we may assume that $|w_{1}|=|w_{2}|>|v_{1}|$ . Apply the above Claim with $x_{1}=v_{0}$ , $y=w_{1}^{l_{1}}$ and $x_{2}$ equal to the first $\left|y\right|$ symbols of $v_{1}w_{2}^{l_{2}}$ , where $l_{2}\geq l_{1}\geq 2$ . Note that $y\neq x_{2}$ , because otherwise by an elementary computation one could show that the rank of $B$ is $1$ . Then for $m$ large enough $A^{\prime}=(x_{1}y^{m}x_{2})^{-1}A$ is arid of rank $\leq(r-1)$ and infinite. By the inductive assumption, there exists $v^{\prime}\in\Sigma_{k}^{*}$ such that $A^{\prime}\cap v^{\prime}\Sigma_{k}^{*}$ takes the required form. It remains to take $v=x_{1}y^{m}x_{2}v^{\prime}$ .∎

Corollary 5.2.

Let $E$ be an infinite $k$ -arid set. Then there exist integers $n\geq 1$ , $r\geq 0$ , $p\geq 1$ , and words $v_{1},\ldots,v_{p},w,u\in\Sigma_{k}^{*}$ , $w\neq\epsilon$ such that

[TABLE]

Proof.

Follows immediately from the second part of Proposition 5.1.∎

Proposition 5.3.

If the set $\{k^{l}\mid l\geq 0\}$ is not generalised polynomial, then neither is any infinite $k$ -arid set.

Proof.

Assume we know that $P=\{k^{l}\mid l\geq 0\}$ is not generalised polynomial. Then neither is any set of the form $P_{t}=\{k^{tl}\mid l\geq 0\}$ for $t\geq 1$ since $P=\bigcup_{j=0}^{t-1}k^{j}P_{t}$ .

Suppose that there exists an infinite $k$ -arid set which is generalised polynomial. Since the class of generalised polynomial sets contains all arithmetic progressions and is closed under finite intersections, Corollary 5.2 allows us to assume that

[TABLE]

for some $p\geq 1,v_{1},\ldots,v_{p},w,u\in\Sigma_{k}^{*}$ , $w\neq\epsilon$ . Let $s=|u|,t=|w|$ and note that

[TABLE]

Let $g$ be a generalised polynomial such that $E={\left\{n\in\mathbb{N}_{0}\ \middle|\ g(n)=0\right\}}$ and assume further that $g$ is a restriction of a generalised polynomial of a real variable that has no further zeros in $\mathbb{R}_{>0}\setminus\mathbb{N}$ . (To this end, replace $g(n)$ by $g(n)^{2}+\left\lVert n\right\rVert^{2}$ .) Then an easy computation shows that the polynomial

[TABLE]

has as its zero set

[TABLE]

where $b_{i}=[w]_{k}+(k^{t}-1)[v_{i}]_{k}$ , $i=1,\ldots,p$ .

The set $C=\{n\in\mathbb{N}_{0}\mid b_{1}n\in B\}$ is also generalised polynomial and it has the form

[TABLE]

with $c_{1}=1$ and $c_{i}=b_{i}k^{tl_{i}}/b_{1}$ , where $l_{i}\geq 0$ is the smallest integer such that $b_{1}$ divides $b_{i}k^{tl_{i}}$ . (If there is no such integer, the corresponding term is not present.)

Let $m\geq 1$ be such that $c_{i}<k^{tm}$ for $i=1,\ldots,p$ . Replacing the set $\{c_{i}k^{tl}\mid l\geq 0\}$ by the union

[TABLE]

and replacing $k$ by $k^{mt}$ , we may assume that

[TABLE]

with $c_{1}=1$ and $1\leq c_{i}<k^{2}$ .

Consider the set $D=\{n\in C\mid n\equiv 1\pmod{k^{2}-1}\}$ . The set $D$ is generalised polynomial and an integer $c_{i}k^{l}\in C$ can be an element of $D$ only if $c_{i}\equiv 1\pmod{k^{2}-1}$ or $c_{i}\equiv k\pmod{k^{2}-1}$ . Since $1\leq c_{i}\leq k^{2}-1$ , this gives $c_{i}=1$ or $c_{i}=k$ and whether the latter possibility is realised or not, we have $D=\{k^{2l}\mid l\geq 0\}$ . This is a contradiction with our remark that no set of the form $P_{t}=\{k^{tl}\mid l\geq 0\}$ , $t\geq 1$ , is generalised polynomial (note that during the proof we have replaced $k$ by its power). ∎

We are now ready to finish the proof of Theorem D.

Proof of Theorem D.

The two statements in Theorem D are of course mutually exclusive. Now assume that there exists a sequence $(a_{n})$ which is $k$ -automatic, generalised polynomial, and not ultimately periodic. By Theorem C, it nevertheless coincides with a periodic sequence $(b_{n})$ except at a set of density zero. Consider the set $C=\{n\in\mathbb{N}_{0}\mid a_{n}\neq b_{n}\}$ . This set is $k$ -automatic, generalised polynomial, sparse, and infinite. By Theorem 3.7, $C$ is then arid and hence by Proposition 5.3 the set $\{k^{l}\mid l\geq 0\}$ is generalised polynomial as well. ∎

6. Concluding remarks

In this section, we gather some remarks and questions which arise naturally. The question with which we begin was already alluded to in the introduction and in [BK16]. As previously discussed, its resolution would suffice to decide if Conjecture A is true.

Question 1.

Let $k\geq 2$ be an integer. Is the set ${\left\{k^{i}\ \middle|\ i\geq 0\right\}}$ generalised polynomial?

We find this question exceptionally pertinent because of its simple formulation.

Morphic words

The class of morphic words is a natural extension of the class of automatic sequences. Let $\Omega$ be a finite set. Any morphism $\varphi$ of the monoid $\Omega^{*}$ extends naturally to $\Omega^{\mathbb{N}_{0}}$ . A word $w\in\Omega^{\mathbb{N}_{0}}$ (which we identify with a function $\mathbb{N}_{0}\to\Omega$ ) is a pure morphic word if it is a fixed point of a non-trivial morphism of $\Omega^{*}$ . A morphic word is the image $\pi\circ w\colon\mathbb{N}_{0}\to\Omega^{\prime}$ of a pure morphic word $w$ under a coding $\pi\colon\Omega\to\Omega^{\prime}$ (i.e., any set-theoretic map, not necessarily injective). Morphic words are connected with automatic sequences via the fact that $k$ -automatic sequences are precisely the morphic words coming from $k$ -uniform morphisms. Here, a morphism $\varphi\colon\Omega^{*}\to\Omega^{*}$ is $k$ -uniform if $\left|\varphi(u)\right|=k$ for all $u\in\Omega$ .

We have already encountered possibly the most famous example of a non-uniform morphic word, the Fibonacci word. Recall from the introduction that the Fibonacci word $w_{\mathrm{Fib}}$ was defined as the limit of the words $w_{0}:=0$ , $w_{1}:=01$ , and $w_{i+2}:=w_{i+1}w_{i}$ . Directly from this definition, it is easy to see that $w_{\mathrm{Fib}}$ is fixed by the morphism $\varphi\colon\Omega^{\mathbb{N}_{0}}\to\Omega^{\mathbb{N}_{0}}$ given by $\varphi(0)=01$ and $\varphi(1)=0$ .

Recall also that $w_{\mathrm{Fib}}$ is a Sturmian word. Here, a Sturmian word is one of the form $f(n)=\left\lfloor\alpha(n+1)+\rho\right\rfloor-\left\lfloor\alpha n+\rho\right\rfloor-\left\lfloor\alpha\right\rfloor$ , where $\alpha,\rho\in\mathbb{R}$ and $\alpha\not\in\mathbb{Q}$ (for $w_{\mathrm{Fib}}$ we may take $\alpha=\rho=2-\varphi$ ). Some (but not all) of these sequences give rise to morphic words; see [BS93] for details (cf. also [Yas99, Fag06, BEIR07]).

In analogy with Conjecture A, one could ask about a classification of all morphic words which are given by generalised polynomials. We believe that examples such as the Fibonacci word are essentially the only possible ones.

Question 2.

Assume that a sequence $f\colon\mathbb{N}_{0}\to\Omega\subset\mathbb{R}$ is both a morphic word and a generalised polynomial. Is it true that $f$ is a linear combination of a number of Sturmian morphic words and an eventually periodic sequence?

Regular sequences

We finish by presenting a generalisation of Conjecture A to regular sequences. We call a function $f\colon\mathbb{N}_{0}\to\mathbb{Z}$ a quasi-polynomial if there exists an integer $m\geq 1$ such that the sequences $f_{j}$ given by $f_{j}(n)=f(mn+j)$ , $0\leq j\leq m-1$ , are polynomials in $n$ . We say that a function $f\colon\mathbb{N}_{0}\to\mathbb{Z}$ is ultimately a quasi-polynomial if it coincides with a quasi-polynomial except on a finite set.

Question 3.

Assume that a sequence $f\colon\mathbb{N}_{0}\to\mathbb{Z}$ is both regular and generalised polynomial. Is it then true that $f$ is ultimately a quasi-polynomial?

If $f$ takes only finitely many values, then all the polynomials inducing $f_{j}$ are necessarily constant, and so in this case the question coincides with Conjecture A.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AB 08] Boris Adamczewski and Jason Bell. Function fields in positive characteristic: expansions and Cobham’s theorem. J. Algebra , 319(6):2337–2350, 2008.
2[AB 12] Boris Adamczewski and Jason P. Bell. On vanishing coefficients of algebraic power series over fields of positive characteristic. Invent. Math. , 187(2):343–393, 2012.
3[AS 92] Jean-Paul Allouche and Jeffrey Shallit. The ring of k 𝑘 k -regular sequences. Theoret. Comput. Sci. , 98(2):163–197, 1992.
4[AS 03a] Jean-Paul Allouche and Jeffrey Shallit. Automatic sequences . Cambridge University Press, Cambridge, 2003.
5[AS 03b] Jean-Paul Allouche and Jeffrey Shallit. The ring of k 𝑘 k -regular sequences. II. Theoret. Comput. Sci. , 307(1):3–29, 2003.
6[BEIR 07] Valérie Berthé, Hiromi Ei, Shunji Ito, and Hui Rao. On substitution invariant Sturmian words: an application of Rauzy fractals. Theor. Inform. Appl. , 41(3):329–349, 2007.
7[Bel 07] Jason P. Bell. p 𝑝 p -adic valuations and k 𝑘 k -regular sequences. Discrete Math. , 307(23):3070–3075, 2007.
8[Ber 81] Jean Berstel. Mots de Fibonacci. In Séminaire d’Informatique Théorique , pages 57–78, Paris, 1980–1981.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Automatic sequences and generalised polynomials

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

Contents

Introduction

Theorem A**.**

Conjecture A**.**

Theorem B**.**

Theorem C**.**

Theorem D**.**

Question A**.**

Contents

Acknowledgements

1. Background

Notations and generalities

Definition 1.1** (Generalised polynomial).**

Automatic sequences

Definition 1.2**.**

Definition 1.3**.**

Definition 1.4**.**

Proposition 1.5**.**

Definition 1.6**.**

Theorem 1.7**.**

Corollary 1.8**.**

Lemma 1.9**.**

Theorem 1.10**.**

Theorem 1.11**.**

Dynamical systems

Corollary 1.12**.**

Theorem 1.13** (Bergelson–Leibman).**

Remark 1.14**.**

2. Density 1 results

Polynomial sequences

Lemma 2.1**.**

Proof.

Proposition 2.2**.**

Proof.

Corollary 2.3**.**

Proof.

Proof of Theorem A.

Generalised polynomials

Lemma 2.4**.**

Proof.

Remark 2.5**.**

Theorem 2.6**.**

Proof.

Proof of Theorem B.

Corollary 2.7**.**

Proof.

Theorem 2.8**.**

3. Combinatorial structure of automatic sets

Arid sets

Definition 3.1** (Arid sets).**

Lemma 3.2**.**

Proof.

Proposition 3.3**.**

Proof.

Proposition 3.4**.**

Proof.

Remark**.**

Remark 3.5**.**

Lemma 3.6**.**

Proof.

Theorem 3.7**.**

IPS sets and automatic sequences

Definition 3.8** (IP⁡\operatorname{IP}IP and IPS\mathrm{IPS}IPS sets).**

Example 3.9**.**

Theorem 3.10**.**

Proof.

Theorem 3.11**.**

Proof of Theorem 3.7.

Proof of Theorem C.

4. Examples and properties of automatic sets

Theorem A.

Conjecture A.

Theorem B.

Theorem C.

Theorem D.

Question A.

Definition 1.1 (Generalised polynomial).

Definition 1.2.

Definition 1.3.

Definition 1.4.

Proposition 1.5.

Definition 1.6.

Theorem 1.7.

Corollary 1.8.

Lemma 1.9.

Theorem 1.10.

Theorem 1.11.

Corollary 1.12.

Theorem 1.13 (Bergelson–Leibman).

Remark 1.14.

Lemma 2.1.

Proposition 2.2.

Corollary 2.3.

Lemma 2.4.

Remark 2.5.

Theorem 2.6.

Corollary 2.7.

Theorem 2.8.

Definition 3.1 (Arid sets).

Lemma 3.2.

Proposition 3.3.

Proposition 3.4.

Remark.

Remark 3.5.

Lemma 3.6.

Theorem 3.7.

Definition 3.8 ( $\operatorname{IP}$ and $\mathrm{IPS}$ sets).

Example 3.9.

Theorem 3.10.

Theorem 3.11.

$\mathcal{B}$ -free sets

Example 4.1.

Remark 4.2.

Example 4.3.

Example 4.4.

Example 4.5.

Translates of $\operatorname{IP}$ sets

Proposition 4.6.

Claim.

Example 4.7.

Lemma 4.8.

Proposition 4.9.

Proposition 5.1.

Claim.

Corollary 5.2.

Proposition 5.3.

Question 1.

Question 2.

Question 3.