Ergodic Theorems for Nonconventional Arrays and an Extension of the   Szemeredi Theorem

Yuri Kifer

arXiv:1702.05628·math.DS·November 30, 2017

Ergodic Theorems for Nonconventional Arrays and an Extension of the Szemeredi Theorem

Yuri Kifer

PDF

Open Access

TL;DR

This paper extends ergodic theorems and Szemerédi's theorem to nonconventional arrays involving polynomial iterates, establishing convergence results and combinatorial implications for subsets of integers with positive density.

Contribution

It introduces new ergodic theorems for nonconventional arrays with polynomial iterates and extends Szemerédi's theorem to these settings, including multidimensional cases.

Findings

01

Convergence of averages for weakly mixing transformations with linear polynomial iterates.

02

Extension of Szemerédi's theorem ensuring structured subsets within dense integer sets.

03

Results for multiple commuting transformations generalizing classical combinatorial theorems.

Abstract

The paper is primarily concerned with the asymptotic behavior as $N \to \infty$ of averages of nonconventional arrays having the form $N^{- 1} \sum_{n = 1}^{N} \prod_{j = 1}^{ℓ} T^{P_{j} (n, N)} f_{j}$ where $f_{j}$ 's are bounded measurable functions, $T$ is an invertible measure preserving transformation and $P_{j}$ 's are polynomials of $n$ and $N$ taking on integer values on integers. It turns out that when $T$ is weakly mixing and $P_{j} (n, N) = p_{j} n + q_{j} N$ are linear or, more generally, have the form $P_{j} (n, N) = P_{j} (n) + Q_{j} (N)$ for some integer valued polynomials $P_{j}$ and $Q_{j}$ then the above averages converge in $L^{2}$ but for general polynomials $P_{j}$ the $L^{2}$ convergence can be ensured even in the case $ℓ = 1$ only when $T$ is strongly mixing. Studying also weakly mixing and compact extensions and relying on Furstenberg's structure theorem we derive an extension of Szemer\' edi's theorem saying that for any…

Equations310

N \to \infty lim inf \frac{1}{N} n = 1 \sum N μ (j = 0 ⋂ ℓ T^{- j n} A) > 0

N \to \infty lim inf \frac{1}{N} n = 1 \sum N μ (j = 0 ⋂ ℓ T^{- j n} A) > 0

\frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{j n} f_{j} ⟶ L^{2} j = 1 \prod ℓ \int f_{j} d μ \mbox a s N \to \infty

\frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{j n} f_{j} ⟶ L^{2} j = 1 \prod ℓ \int f_{j} d μ \mbox a s N \to \infty

\frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{P_{j} (n, N)} f_{j}

\frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{P_{j} (n, N)} f_{j}

\frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{p_{j} n + q_{j} N} f_{j} ⟶ L^{2} j = 1 \prod ℓ \int f_{j} d μ \mbox a s N \to \infty.

\frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{p_{j} n + q_{j} N} f_{j} ⟶ L^{2} j = 1 \prod ℓ \int f_{j} d μ \mbox a s N \to \infty.

\frac{1}{N} n = 1 \sum N i = 1 \prod k T^{i (N - n)} f_{k - i + 1} i = 1 \prod k T^{in} f_{k + i} .

\frac{1}{N} n = 1 \sum N i = 1 \prod k T^{i (N - n)} f_{k - i + 1} i = 1 \prod k T^{in} f_{k + i} .

N \to \infty, N \in N_{A} lim inf \frac{1}{N} n = 1 \sum N μ (j = 0 ⋂ ℓ T^{- (p_{j} n + q_{j} N)} A) > 0

N \to \infty, N \in N_{A} lim inf \frac{1}{N} n = 1 \sum N μ (j = 0 ⋂ ℓ T^{- (p_{j} n + q_{j} N)} A) > 0

\frac{1}{N} n = 1 \sum N μ (j = 0 ⋂ ℓ T_{j}^{- n} \hat{T}_{j}^{- N} A) \mbox an d o f \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T_{j}^{n} \hat{T}_{j}^{N} f_{j}

\frac{1}{N} n = 1 \sum N μ (j = 0 ⋂ ℓ T_{j}^{- n} \hat{T}_{j}^{- N} A) \mbox an d o f \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T_{j}^{n} \hat{T}_{j}^{N} f_{j}

N \to \infty lim \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{p_{j} n + q_{j} N} f_{j}

N \to \infty lim \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{p_{j} n + q_{j} N} f_{j}

N \to \infty lim \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{p_{j} n + q_{j} N} f_{j} = j = 1 \prod ℓ \int f_{j} d μ \mbox in L^{2} (X, μ) .

N \to \infty lim \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{p_{j} n + q_{j} N} f_{j} = j = 1 \prod ℓ \int f_{j} d μ \mbox in L^{2} (X, μ) .

\frac{1}{N} \int n = 1 \sum N T^{p n + q_{1} N} f_{1} T^{p n + q_{2} N} f_{2} d μ = \int f_{1} T^{(q_{2} - q_{1}) N} f_{2} d μ,

\frac{1}{N} \int n = 1 \sum N T^{p n + q_{1} N} f_{1} T^{p n + q_{2} N} f_{2} d μ = \int f_{1} T^{(q_{2} - q_{1}) N} f_{2} d μ,

\liminf_{N\to\infty,\,N\in{\mathcal{N}}_{A}}\frac{1}{N}\sum_{n=1}^{N}\mu\big{(}\bigcap_{j=0}^{\ell}T^{-(p_{j}n+q_{j}N)}A\big{)}>0.

\liminf_{N\to\infty,\,N\in{\mathcal{N}}_{A}}\frac{1}{N}\sum_{n=1}^{N}\mu\big{(}\bigcap_{j=0}^{\ell}T^{-(p_{j}n+q_{j}N)}A\big{)}>0.

n \to \infty lim \frac{∣ Λ \cap [ a _{n} , b _{n} ) ∣}{( b _{n} - a _{n} )} = d > 0

n \to \infty lim \frac{∣ Λ \cap [ a _{n} , b _{n} ) ∣}{( b _{n} - a _{n} )} = d > 0

a_{n} + p_{j} n + q_{j} N \in Λ \mbox f or a l l j = 0, 1, ..., ℓ .

a_{n} + p_{j} n + q_{j} N \in Λ \mbox f or a l l j = 0, 1, ..., ℓ .

N \to \infty lim \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T_{j}^{n} \hat{T}_{j}^{N} f_{j}

N \to \infty lim \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T_{j}^{n} \hat{T}_{j}^{N} f_{j}

N \to \infty lim \frac{1}{N} n = 0 \sum N j = 1 \prod ℓ T_{j}^{n} \hat{T}_{j}^{N} f_{j} = j = 1 \prod ℓ \int f_{j} d μ \mbox in L^{2} (X, μ)

N \to \infty lim \frac{1}{N} n = 0 \sum N j = 1 \prod ℓ T_{j}^{n} \hat{T}_{j}^{N} f_{j} = j = 1 \prod ℓ \int f_{j} d μ \mbox in L^{2} (X, μ)

\liminf_{N\to\infty,\,N\in{\mathcal{N}}_{A}}\frac{1}{N}\sum_{n=1}^{N}\mu\big{(}\bigcap_{j=0}^{\ell}(T_{j}^{n}\hat{T}_{j}^{N})^{-1}A\big{)}>0

\liminf_{N\to\infty,\,N\in{\mathcal{N}}_{A}}\frac{1}{N}\sum_{n=1}^{N}\mu\big{(}\bigcap_{j=0}^{\ell}(T_{j}^{n}\hat{T}_{j}^{N})^{-1}A\big{)}>0

n \to \infty lim \frac{∣ Λ \cap B ( a ˉ ( n ) , b ˉ ( n )) ∣}{\prod _{1 \leq i \leq d} ( b _{i} ( n ) - a _{i} ( n ))} = d > 0

n \to \infty lim \frac{∣ Λ \cap B ( a ˉ ( n ) , b ˉ ( n )) ∣}{\prod _{1 \leq i \leq d} ( b _{i} ( n ) - a _{i} ( n ))} = d > 0

a_{n} + n Γ + N \hat{Γ} \subset Λ .

a_{n} + n Γ + N \hat{Γ} \subset Λ .

φ_{i} (n) = T_{1}^{P_{i 1} (n)} \dots T_{k}^{P_{ik} (n)}, i = 1, ..., ℓ,

φ_{i} (n) = T_{1}^{P_{i 1} (n)} \dots T_{k}^{P_{ik} (n)}, i = 1, ..., ℓ,

φ_{i} (n) φ_{j}^{- 1} (n) = T_{1}^{P_{i 1} (n) - P_{j 1} (n)} T_{2}^{P_{i 2} (n) - P_{j 2}} \dots T_{k}^{P_{ik} (n) - P_{j k} (n)}, i, j = 1, ..., ℓ, i \neq = j,

φ_{i} (n) φ_{j}^{- 1} (n) = T_{1}^{P_{i 1} (n) - P_{j 1} (n)} T_{2}^{P_{i 2} (n) - P_{j 2}} \dots T_{k}^{P_{ik} (n) - P_{j k} (n)}, i, j = 1, ..., ℓ, i \neq = j,

N \to \infty lim \frac{1}{N} n = 1 \sum N i = 1 \prod ℓ T_{1}^{P_{i 1} (n)} \dots T_{k}^{P_{ik} (n)} \hat{T}_{1}^{Q_{i q} (N)} \dots \hat{T}_{k}^{Q_{ik} (N)} f_{i}

N \to \infty lim \frac{1}{N} n = 1 \sum N i = 1 \prod ℓ T_{1}^{P_{i 1} (n)} \dots T_{k}^{P_{ik} (n)} \hat{T}_{1}^{Q_{i q} (N)} \dots \hat{T}_{k}^{Q_{ik} (N)} f_{i}

= i = 1 \prod ℓ \int f_{i} d μ \mbox in L^{2} (X, μ) .

N \to \infty lim \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{P_{j} (n, N)} f_{j} = j = 1 \prod ℓ \int f_{j} d μ \mbox in L^{2} (X, μ)

N \to \infty lim \frac{1}{N} n = 1 \sum N j = 1 \prod ℓ T^{P_{j} (n, N)} f_{j} = j = 1 \prod ℓ \int f_{j} d μ \mbox in L^{2} (X, μ)

\frac{1}{N} n = 1 \sum N T^{n N} f

\frac{1}{N} n = 1 \sum N T^{n N} f

∣ k_{i} - k_{j} ∣ \to \infty, \forall i \neq = j lim μ (i = 1 ⋂ m T^{- k_{i}} Γ_{i})

∣ k_{i} - k_{j} ∣ \to \infty, \forall i \neq = j lim μ (i = 1 ⋂ m T^{- k_{i}} Γ_{i})

= l_{1}, ..., l_{m - 1} \to \infty lim μ (Γ_{1} \cap T^{- l_{1}} Γ_{2} \cap ... \cap T^{- (l_{1} + \dots + l_{m - 1})} Γ_{m}) = i = 1 \prod m μ (Γ_{i})

α (n) = m sup {∣ μ (A \cap B) - μ (A) μ (B) ∣ : A \in F_{- \infty, m}, B \in F_{m + n, \infty}} \to 0 \mbox a s n \to \infty.

α (n) = m sup {∣ μ (A \cap B) - μ (A) μ (B) ∣ : A \in F_{- \infty, m}, B \in F_{m + n, \infty}} \to 0 \mbox a s n \to \infty.

l_{1}, ..., l_{k - 1} \to \infty lim μ (Γ_{1} \cap T^{- l_{1}} Γ_{2} \cap ... \cap T^{- (l_{1} + \dots + l_{k - 1})} Γ_{k}) = i = 1 \prod k μ (Γ_{i}) .

l_{1}, ..., l_{k - 1} \to \infty lim μ (Γ_{1} \cap T^{- l_{1}} Γ_{2} \cap ... \cap T^{- (l_{1} + \dots + l_{k - 1})} Γ_{k}) = i = 1 \prod k μ (Γ_{i}) .

\big{|}\mu(\bigcap_{i=1}^{k}G_{i})-\prod_{i=1}^{k}\mu(G_{i})\big{|}\leq\sum_{i=1}^{k-1}{\alpha}(m_{i+1}-n_{i}).

\big{|}\mu(\bigcap_{i=1}^{k}G_{i})-\prod_{i=1}^{k}\mu(G_{i})\big{|}\leq\sum_{i=1}^{k-1}{\alpha}(m_{i+1}-n_{i}).

\limsup_{l_{1},...,l_{k-1}\to\infty}\big{|}\mu({\Gamma}_{1}\cap T^{-l_{1}}{\Gamma}_{2}\cap...\cap T^{-(l_{1}+\cdots+l_{k-1})}{\Gamma}_{k})-\prod_{i=1}^{k}\mu({\Gamma}_{i})\big{|}\leq 2k{\varepsilon}

\limsup_{l_{1},...,l_{k-1}\to\infty}\big{|}\mu({\Gamma}_{1}\cap T^{-l_{1}}{\Gamma}_{2}\cap...\cap T^{-(l_{1}+\cdots+l_{k-1})}{\Gamma}_{k})-\prod_{i=1}^{k}\mu({\Gamma}_{i})\big{|}\leq 2k{\varepsilon}

E (g ∣ Y) (y) = \int g d μ_{y} .

E (g ∣ Y) (y) = \int g d μ_{y} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLimits and Structures in Graph Theory · Mathematical Dynamics and Fractals · graph theory and CDMA systems

Full text

Ergodic Theorems for Nonconventional Arrays and an Extension

of the Szemerédi Theorem

Yuri Kifer

Institute of Mathematics

Hebrew University

Jerusalem, Israel

Institute of Mathematics, The Hebrew University, Jerusalem 91904, Israel

[email protected]

Abstract.

The paper is primarily concerned with the asymptotic behavior as $N\to\infty$ of averages of nonconventional arrays having the form $N^{-1}\sum_{n=1}^{N}\prod_{j=1}^{\ell}T^{P_{j}(n,N)}f_{j}$ where $f_{j}$ ’s are bounded measurable functions, $T$ is an invertible measure preserving transformation and $P_{j}$ ’s are polynomials of $n$ and $N$ taking on integer values on integers. It turns out that when $T$ is weakly mixing and $P_{j}(n,N)=p_{j}n+q_{j}N$ are linear or, more generally, have the form $P_{j}(n,N)=P_{j}(n)+Q_{j}(N)$ for some integer valued polynomials $P_{j}$ and $Q_{j}$ then the above averages converge in $L^{2}$ but for general polynomials $P_{j}$ the $L^{2}$ convergence can be ensured even in the case $\ell=1$ only when $T$ is strongly mixing. Studying also weakly mixing and compact extensions and relying on Furstenberg’s structure theorem we derive an extension of Szemerédi’s theorem saying that for any subset of integers ${\Lambda}$ with positive upper density there exists a subset ${\mathcal{N}}_{\Lambda}$ of positive integers having uniformly bounded gaps such that for $N\in{\mathcal{N}}_{\Lambda}$ and at least ${\varepsilon}N,\,{\varepsilon}>0$ of $n$ ’s all numbers $p_{j}n+q_{j}N,\,j=1,...,\ell,$ belong to ${\Lambda}$ . We obtain also a version of these results for several commuting transformations which yields a corresponding extension of the multidimensional Szemerédi theorem.

Key words and phrases:

Szeméredi theorem, multiple recurrence, nonconventional averages, triangular arrays

2010 Mathematics Subject Classification:

Primary: 37A30 Secondary: 37A45, 28D05

A part of this work was done during the author’s visit to University of Pennsylvania in Fall of 2016 as the Bogen family visiting professor.

1. Introduction

In 1975 Szemerédi proved the conjecture of Erdős and Turan saying that any set of integers with positive upper density contains arbitrary long arithmetic progressions. In 1977 Furstenberg [10] published an ergodic theory proof of this result which turned out to be a corollary of a multiple recurrence statement for measure preserving transformations.

Namely, let $(X,{\mathcal{B}},\mu)$ be a probability space, $T:X\to X$ be an invertible $\mu$ -preserving transformation and $A\in{\mathcal{B}}$ be a set of positive $\mu$ -measure. Furstenberg proved that in these circumstances for any positive integer $\ell$ ,

[TABLE]

which, in fact, implies existence of infinitely many arithmetic progressions in any set of integers having positive upper density (for a nice exposition of this result see [14]).

An important part of the proof of (1.1) was to show that

[TABLE]

(where $T^{m}f(x)=f(T^{m}x)$ ) provided $T$ is a measure preserving weakly mixing invertible transformation and $f_{j}$ ’s are bounded measurable functions. In fact, (1.1) required more general results concerning weak mixing and compact extensions together with a structure theorem describing all possible extensions. Observe that in [3] the $L^{2}$ convergence (1.2) for weakly mixing transformations was extended from powers $jn$ to arbitrary essentially distinct polynomials $P_{j}(n)$ (i.e. having nonconstant pairwise differences) taking on integer values on integers.

In this paper we consider the averages of the form

[TABLE]

where $f_{j}$ ’s are bounded measurable functions, $T$ is an invertible measure preserving transformation and $P_{j}(n,N),\,j=1,...,\ell,$ are essentially distinct polynomials of $n$ and $N$ taking on integer values on integers. It is customary in probability to call sums whose summands depend on the number $N$ of summands by the name (triangular) arrays and it seems appropriate to use the same name for sums in (1.3) too while the term ”nonconventional” comes from [12].

First, we study the linear case $P_{j}(n,N)=p_{j}n+q_{j}N$ where $p_{j}$ ’s are distinct and $q_{j}$ ’s are arbitrary integers. It turns out that under the weak mixing assumption on $T$ ,

[TABLE]

In particular, when $\ell=2k,\,q_{i}=-p_{i}=k-i+1$ for $i=1,...,k$ and $p_{i}=i-k,q_{i}=0$ for $i=k+1,...,2k$ the left hand side of (1.4) takes on the following symmetric form

[TABLE]

It is known by [19] that when $q_{j}=0$ for all $j=1,...,\ell$ then the left hand side of (1.4) still converges in $L^{2}$ also without the weak mixing assumption on $T$ but not necessarily to the right hand side of (1.4). On the other hand, a simple example shows that for arbitrary $q_{j}$ ’s there is no convergence of the left hand side of (1.4) if $T$ is not weakly mixing. Indeed, take $k=1$ in (1.5) and let $T$ be the rotation of the unit circle by one half of it while $f_{1}=f_{2}=f={\mathbb{I}}_{A}$ be the indicator of an arc $A$ having length less than one half of the circle. Then $(T^{n}f)(T^{N-n}f)={\mathbb{I}}_{T^{-n}A\cap T^{-(N-n)}A}$ and this expression equals the indicator ${\mathbb{I}}_{A}$ of $A$ or the indicator ${\mathbb{I}}_{TA}$ of $TA$ (depending on the parity of $n$ ) if $N$ is even while it equals zero for otherwise. Thus, the averages (1.5) will be equal to $\frac{1}{2}({\mathbb{I}}_{A}+{\mathbb{I}}_{TA})$ for each even $N$ and 0 for each odd $N$ .

We complement the above study by considering weak mixing and compact extensions and relying on the structure theorem from [10] and [11] we conclude that for any invertible measure preserving transformation $T$ , numbers $p_{j},q_{j}$ as above and a set $A$ of positive measure there exists a subset ${\mathcal{N}}_{A}\subset{\mathbb{N}}$ of positive integers with uniformly bounded gaps, called syndetic set, such that

[TABLE]

while this does not hold true, in general, if we take the limit over all positive integers which can be seen from the above example. In fact, we also show that (1.6) follows by a shorter argument relying on recent advanced results from [6] and [2] concerning convergencies along Følner sequences in multidimensionar multiple recurrence results but our direct proof still has a value since, in particular, it concentrates attention on nonconventional arrays and the convergence results like (1.4) cannot be derived from the above references.

By the standard Furstenberg’s argument (1.6) implies an extended version of Szemerédi’s theorem saying that for any subset of integers ${\Lambda}$ with positive upper density there exists a syndetic subset ${\mathcal{N}}_{{\Lambda}}\subset{\mathbb{N}}$ such that for all $N\in{\mathcal{N}}_{\Lambda}$ and at least ${\varepsilon}N,\,{\varepsilon}>0$ of $n$ ’s, all numbers $p_{j}n+q_{j}N,\,j=1,...,\ell,$ belong to ${\Lambda}$ . We obtain also more general results concerning families of commuting transformations $T_{j},\,\hat{T}_{j},\,j=1,2,...,\ell,$ studying the limits of

[TABLE]

where $A$ and $f_{j},\,j=1,...,\ell,$ are as above.

If we consider $P_{j}(n,N)=P_{j}(n)+Q_{j}(N)$ in (1.3), where $P_{j}$ ’s are essentially distinct and $Q_{j}$ ’s are arbitrary polynomials, then the convergence in $L^{2}$ of nonconventional averages (1.3) to the product of integrals can be established under the weak mixing assumption. On the other hand, already for $\ell=1$ and $P_{1}(n,N)=nN$ weak mixing is not sufficient, in general, for the $L^{2}$ convergence of averages (1.3) though strong mixing suffices here. For $\ell>1$ and general polynomials $P_{j}(n,N)$ we show $L^{2}$ convergence of the expression (1.3) assuming strong $2\ell$ -mixing of $T$ .

Acknowledgement.

The author is greatful to anonymous referees for many helpful suggestions which led to improvements of the original version of this paper.

2. Preliminaries and main results

Let $(X,{\mathcal{B}},\mu)$ be a separable probability space and $T:X\to X$ be an invertible measure $\mu$ preserving transformation. In studying polynomial nonconventional averages (1.3) we start with the linear case $P_{j}(n,N)=p_{j}n+q_{j}N$ . Let $f_{i},\,i=0,1,...,\ell,$ be bounded measurable functions on $X$ . The example described in Introduction shows that, in general, the limit

[TABLE]

does not exist in the $L^{2}$ -sense. Still, we will see that the limit (2.1) exists in the $L^{2}$ sense if $T$ is weakly mixing which means that the product transformation $T\times T$ on $(X\times X,\,{\mathcal{B}}\times{\mathcal{B}},\,\mu\times\mu)$ is ergodic (see, for instance, [11]). Thus we have the following $L^{2}$ ergodic theorem for nonconventional arrays.

2.1 Theorem.

Suppose that an invertible transformation $T$ is weakly mixing, $f_{j},\,j=1,...,\ell,$ are bounded measurable functions and $p_{j},\,q_{j},\,j=1,...,\ell,$ are integers such that $p_{j}$ ’s are distinct (ordered without loss of generality as $p_{1}<p_{2}<...<p_{\ell}$ ) and $q_{j}$ ’s are arbitrary. Then

[TABLE]

The condition of Theorem 2.1 that $p_{j},\,j=1,...,\ell,$ are distinct is important for (2.2). Indeed, let $p_{1}=p_{2}=p$ and $\int f_{1}d\mu=0$ . Then

[TABLE]

which does not converge to zero as $N\to\infty$ , in general, unless $T$ is (strongly) mixing (and $q_{1}\neq q_{2}$ ) while under weak mixing only convergence ouside of a set of $N$ ’s having zero density can be ensured. We observe (as pointed out by the referee) that Theorem 2.1 actually follows from Theorem 3.2 in the recent paper [20] though motivation and goals of the latter paper seem to be different from ours. In fact, we will study convergence in a more general situation of weak mixing extensions and, in addition, will consider also compact extensions, which together with the structure theorem from [10] and [11] will produce the following result.

2.2 Theorem.

Let $p_{j},q_{j},\,j=0,1,...,\ell,$ be integers such that $p_{0}=q_{0}=0,\,p_{j}\neq 0$ if $j\neq 0$ and $p_{1}<p_{2}<...<p_{\ell}$ . Then for any $A\in{\mathcal{B}}$ with $\mu(A)>0$ there exists an infinite subset of positive integers ${\mathcal{N}}_{A}\subset{\mathbb{N}}$ with uniformly bounded gaps such that

[TABLE]

As the example in Introduction shows this statement does not hold true, in general, if we take $\liminf$ over all $N\to\infty$ . On the other hand, if $q_{j}=0$ for all $j$ then (2.1) was proved in [10] (see also [14]) with $\liminf$ over all $N\to\infty$ and it was shown there how such result yields the Szemerédi type theorem. Recall briefly the latter argument. Let $\{0,1\}^{\mathbb{Z}}=\{{\omega}=({\omega}_{i}):\,{\omega}_{i}\in\{0,1\},\,-\infty<i<\infty\}$ be the space of sequences, $(T{\omega})_{i}={\omega}_{i+1}$ be the left shift and consider the special sequence $\bar{\omega}=(\bar{\omega}_{i})_{i=1}^{\infty}$ where $\bar{\omega}_{i}=1$ if and only if $i\in{\Lambda})$ with ${\Lambda}\subset{\mathbb{Z}}$ being a subset of integers with a positive upper density (called, also the upper Banach density), i.e.,

[TABLE]

for some sequence of intervals with $b_{n}-a_{n}\to\infty$ as $n\to\infty$ , denoting by $|{\Gamma}|$ the number of elements in a set ${\Gamma}$ . Take $X=$ the closure in $\{0,1\}^{\mathbb{Z}}$ of $\{T^{n}\bar{\omega}\}^{\infty}_{n=-\infty}$ then any weak limit $\mu$ of the sequence of measures $\mu_{n}=(b_{n}-a_{n})^{-1}\sum_{j=a_{n}}^{b_{n}}{\delta}_{T^{j}\bar{\omega}}$ (where ${\delta}_{\omega}$ is the unit mass at ${\omega}$ ) is a $T$ -invariant probability measure on $X$ and if $A=X\cap\{{\omega}:\,{\omega}_{0}=1\}$ then $\mu(A)=d>0$ .

It is easy to see that ${\Lambda}$ contains an arithmetic progression of length $\ell$ if and only if $\bigcap_{j=0}^{\ell-1}T^{-jb}A$ is nonempty for some $b\neq 0$ . More generally, ${\Lambda}$ contains all numbers $a+p_{j}n+q_{j}N,\,j=0,1,...,\ell,$ for some $a\in{\Lambda}$ if and only if $\bigcap_{j=0}^{\ell}T^{-(p_{j}n+q_{j}N)}A$ is nonempty. Thus, Theorem 2.2 yields the following result.

2.3 Corollary.

Let ${\Lambda}$ be a subset of nonnegative integers with a positive upper density and $p_{j},q_{j},\,j=0,1,...,\ell,$ be integers satisfying conditions of Theorem 2.2. Then there exist ${\varepsilon}>0$ and an infinite set of positive integers ${\mathcal{N}}_{\Lambda}$ with uniformly bounded gaps such that for any $N\in{\mathcal{N}}_{\Lambda}$ the interval $[0,N]$ contains not less than ${\varepsilon}N$ integers $n$ with the property that for some $a_{n}$ ,

[TABLE]

In particular, if $\ell=2k$ , $q_{j}=-p_{j}=k-j+1$ for $j=1,...,k$ and $p_{j}=j-k,\,q_{j}=0$ for $j=k+1,...,2k$ then for at least ${\varepsilon}N,\,N\in{\mathcal{N}}_{\Lambda}$ integers $n$ in the interval $[0,N]$ the set ${\Lambda}$ contains arithmetic progressions with length $k+1$ of both step $n$ and of step $N-n$ .

Clearly, the above corollary does not hold true, in general, if we replace ${\mathcal{N}}_{\Lambda}$ by all positive integers. Indeed, let ${\Lambda}$ be the set of all even numbers then $a+n$ and $a+(N-n)$ cannot both belong to ${\Lambda}$ if $N$ is odd since then $a+n$ and $a+(N-n)$ cannot be both even.

Next, we will discuss an extension of the above results to families of commuting transformations, which will yield also a multidimensional version of Corollary 2.3. Let $G$ be a multiplicative free finitely generated abelian group acting on $X$ by measure $\mu$ -preserving transformations which are necessarily invertible. Any such group is isomorphic to a $d$ -dimensional integer lattice ${\mathbb{Z}}^{d}$ group. Let $f_{i},\,i=0,1,...,\ell,$ be bounded measurable functions on $X$ . As in the case of one transformation, in general, the limit

[TABLE]

does not exists if $N\to\infty$ over all $N$ . Nevertheless, we will see that the limit (2.6) exists in the $L^{2}$ sense if the abelian group $G$ is totally weak mixing, i.e. it consists of weakly mixing transformations with the only exception of the identity.

2.4 Theorem.

Suppose that distinct and different from the identity (id) transformations $T_{1},...,T_{\ell}$ belong to a totally weak mixing free finitely generated abelian group $G$ acting on $(X,{\mathcal{B}},\mu)$ by measure preserving transformations. Let $\hat{T}_{1},...,\hat{T}_{\ell}$ be invertible $\mu$ -preserving transformations of $X$ , which commute with each other and with $T_{1},...,T_{\ell}$ . Then for any bounded measurable functions $f_{j},\,j=0,1,...,\ell$ ,

[TABLE]

where $T_{0}=\hat{T}_{0}=\mbox{id}$ .

Considering weak mixing and primitive extensions we will obtain the following generalization of Theorem 2.2.

2.5 Theorem.

Let $T_{j},\,\hat{T}_{j}\in G,\,j=1,...,\ell,$ where $T_{1},...,T_{\ell}$ are distinct and different from the identity id of $G$ while $\hat{T}_{1},...,\hat{T}_{\ell}$ are any transformations from $G$ . Then for any $A\in{\mathcal{B}}$ with $\mu(A)>0$ there exists an infinite subset of positive integers ${\mathcal{N}}_{A}\subset{\mathbb{N}}$ with uniformly bounded gaps such that

[TABLE]

where we set $T_{0}=\hat{T}_{0}=\mbox{id}$ .

Clearly, if we set $T_{j}=T^{p_{j}}$ and $\hat{T}_{j}=T^{q_{j}}$ then we arrive back at the setup of Theorem 2.2. For $\hat{T}_{j},\,j=1,2,...,\ell,$ equal the identity (2.9) was proved in [13] with ${\mathcal{N}}_{A}={\mathbb{N}}$ but our proof will follow more closely Chapter 7 of [11]. Similarly to the one transformation case Theorem 2.5 yields an extension of a multidimensional version of the Szemerédi theorem. Recall, the notion of the upper (Banach) density of a set ${\Lambda}\subset{\mathbb{Z}}^{d}$ . For any two vectors $\bar{a}=(a_{1},...,a_{d}),\,\bar{b}=(b_{1},...,b_{d})\in{\mathbb{Z}}^{d}$ such that $a_{i}<b_{i},\,i=1,...,d,$ denote by $B(\bar{a},\bar{b})$ the parallelepiped $\prod_{i=1}^{d}[a_{i},b_{i}]$ . A set ${\Lambda}\subset{\mathbb{Z}}^{d}$ is said to have positive upper (Banach) density if there exists a sequence of parallelepipeds $B(\bar{a}(n),\bar{b}(n))$ with $\bar{a}(n)=(a_{1}(n),...,a_{d}(n)),\,\bar{b}(n)=(b_{1}(n),...,b_{d}(n))$ satisfying $\lim_{n\to\infty}\min_{1\leq i\leq d}(b_{i}(n)-a_{i}(n))=\infty$ and such that

[TABLE]

where, again, $|{\Gamma}|$ denotes the number of points in a set ${\Gamma}$ .

Since the group in Theorem 2.5 is isomorphic to ${\mathbb{Z}}^{d}$ we can identify the actions of $T_{j}$ and $\hat{T}_{j}$ with additions of some vectors $z_{i}\in{\mathbb{Z}}^{d}$ and $\hat{z}_{i}\in{\mathbb{Z}}^{d}$ . For any ordered finite set ${\Gamma}=\{z_{1},...,z_{\ell}\},\,z_{i}\in{\mathbb{Z}}^{d}$ , $n\in{\mathbb{Z}}$ and $a\in{\mathbb{Z}}^{d}$ we set $n{\Gamma}=\{nz_{1},...,nz_{\ell}\}$ and $a+{\Gamma}=\{a+z_{1},...,a+z_{\ell}\}$ . Next, if ${\Gamma}=\{z_{1},...,z_{\ell}\}$ and $\hat{\Gamma}=\{\hat{z}_{1},...,\hat{z}_{\ell}\},\,z_{i},\hat{z}_{i}\in{\mathbb{Z}}^{d}$ are two ordered finite sets then we write ${\Gamma}+\hat{\Gamma}=\{z_{1}+\hat{z}_{1},...,z_{\ell}+\hat{z}_{\ell}\}$ . Now Theorem 2.5 yields the following extension of the multidimensional Szemerédi theorem.

2.6 Corollary.

Let ${\Lambda}$ be a subset of ${\mathbb{Z}}^{d}$ with a positive upper (Banach) density and let ${\Gamma}=\{z_{1},...,z_{\ell}\},\,\hat{\Gamma}=\{\hat{z}_{1},...,\hat{z}_{\ell}\}$ be two ordered sets of vectors from ${\mathbb{Z}}^{d}$ such that $z_{1},z_{2},...,z_{\ell}$ are all distinct and non zero. Then there exist ${\varepsilon}>0$ and an infinite set of positive integers ${\mathcal{N}}_{\Lambda}$ with uniformly bounded gaps such that for any $N\in{\mathcal{N}}_{\Lambda}$ the interval $[0,N]$ contains not less than ${\varepsilon}N$ integers $n$ such that for some $a_{n}\in{\Lambda}$ ,

[TABLE]

Corollary 2.6 follows from Theorem 2.5 similarly to the one transformation case. Namely, we consider the action of ${\mathbb{Z}}^{d}$ on $\{0,1\}^{{\mathbb{Z}}^{d}}=\{{\omega}=({\omega}_{v}),\,{\omega}_{v}\in\{0,1\},\,v\in{\mathbb{Z}}^{d}\}$ by $(z{\omega})_{v}={\omega}_{v+z}$ for any $z,v\in{\mathbb{Z}}^{d}$ . Again, we take $X$ to be the closure in $\{0,1\}^{{\mathbb{Z}}^{d}}$ of the orbit ${\mathbb{Z}}^{d}\bar{\omega}$ of the special sequence $\bar{\omega}=(\bar{\omega}_{v},\,\bar{\omega}_{v}=1$ if and only if $v\in{\Lambda})$ and an ${\mathbb{Z}}^{d}$ -invariant measure $\mu$ comes as a weak limit as $n\to\infty$ of the measures $\prod_{1\leq i\leq d}(b_{i}(n)-a_{i}(n))^{-1}\sum_{z\in B(\bar{a}(n),\bar{b}(n))}{\delta}_{z\bar{\omega}}$ where $B(\bar{a}(n),\bar{b}(n)),\,n=1,2,...,$ are the same as in (2.9).

The proofs of the above results proceed similarly to [14] and [11], and so we will be trying to make a compromise between keeping the paper relatively self-contained and still avoiding too many repetitions of arguments from [14] and [11]. Though, of course, Theorem 2.2 is a particular case of Theorem 2.5, in order to facilitate the reading, we will consider first the one transformation case and then pass to the case of commuting transformations.

As we mentioned it in Introduction it is possible to give a shorter argument yielding Theorems 2.2 and 2.5, which will be presented in Section 5. This argument relies on quite general results from the recent paper [2]. In fact, this argument together with [6] yields Theorem 2.1 with linear terms $p_{i}n+q_{i}N$ replaced by arbitrary polynomials $p_{i}(n,N),\,i=1,...,\ell,$ taking on integer values for integer pairs $n,N$ and such that for any integer $k$ there exist $n$ and $N$ with $p_{i}(n,N)$ divisible by $k$ for each $i=1,...,\ell$ . The results in [6] and [2] rely on advanced machinery developed with the purpose to derive convergence of nonconventional averages in various situations. The direct proof presented here, which proceeds along the lines of the original proof in [14] and [11], still seems to be useful, in particular, for focusing attention on limiting behavior of nonconventional arrays, which is a somewhat different point of view in comparison to other research on multiple recurrence problems and since Theorems 2.2 and 2.5 do not follow from [6] and [2].

2.7 Remark.

As we have seen, the limit in Theorem 2.1 does not exist, in general, without the weak mixing assumption but it is plausible that the limit may exist over syndetic subsequences of $N$ ’s. It would be interesting also to obtain some uniform versions of Theorems 2.2 and 2.5 in the spirit of [4]. It would be also natural to find most general conditions, which ensure almost everywhere convergence of averages of nonconventional arrays though this question is not completely settled even for standard nonconventional averages (i.e. without dependence of summands on $N$ ). Finally, we observe that it may be interesting to obtain a result of the type of Corollary 2.3 for the set of primes in place of a set of positive upper density extending to this situation the main result of [15]. In this case relevant sets of $N$ ’s will probably have gaps containing only bounded number of primes.

Next, we consider averages of nonconventional arrays (1.3) with higher degree polynomials $P_{j}(n,N),\,j=1,...,\ell$ . When we can separate dependencies on $n$ and $N$ the applying the ”PET-induction” from [5] for polynomials in $n$ and, essentially, treating $N$ as a constant there we will obtain in Section 6 the following result.

2.8 Theorem.

Let $T_{1},...,T_{k}$ be different from identity transformations belonging to a totally weak mixing finitely generated free abelian group $G$ acting on $(X,{\mathcal{B}},\mu)$ by measure preserving transformations and $\hat{T}_{1},...,\hat{T}_{k}$ be invertible $\mu$ -preserving transformations of $X$ which commute with each other and with $T_{1},...,T_{k}$ . Furthermore, let $P_{ij}(n),\,i=1,...,\ell,\,j=1,...,k,$ be polynomials taking on integer values on integers and suppose that the expressions

[TABLE]

and the expressions

[TABLE]

depend nontrivially on $n$ (i.e. that they are nonconstant maps from ${\mathbb{Z}}$ to $G$ ). In addition, let $Q_{ij}(N),\,i=1,...,\ell,\,j=1,...,k,$ be arbitrary functions of $N$ taking on integer values on integers. Then, for any bounded measurable functions $f_{i},\,i=1,...,\ell,$

[TABLE]

If all $T_{i}$ ’s and $\hat{T}_{i}$ ’s coincide with one transformation $T$ then (2.11) becomes

[TABLE]

with $P_{j}(n,N)$ ’s taking the form $P_{j}(n,N)=P_{j}(n)+Q_{j}(N)$ where $P_{j}(n)$ ’s are nonconstant essentially distinct polynomials of $n$ and $Q_{j}(N)$ ’s are function of $N$ , both taking on integer values on integers. It turns out that for general polynomials of $n$ and $N$ weak mixing may not be enough for the $L^{2}$ convergence in (1.3). In Section 6 we will show employing a version of a spectral argument suggested to us by Benji Weiss that already the averages

[TABLE]

do not converge in $L^{2}$ as $N\to\infty$ , in general, if $T$ is only weak mixing. Still, strong mixing of $T$ ensures convergence in $L^{2}$ for this example. More generally, we will prove the following result where we rely on the notion of strong $m$ -mixing, which means that

[TABLE]

for any measurable sets ${\Gamma}_{1},...,{\Gamma}_{m}$ .

2.9 Theorem.

Let $P_{j}(n,N),\,j=1,...,\ell,$ be nonconstant essentially distinct polynomials of $n$ and $N$ (i.e. $P_{i}(n,N)-P_{j}(n,N),\,i\neq j$ is not a constant identically) taking on integer values on integers and nontrivially depending on $n$ (i.e. $P_{i}(n,N)$ is not just a polynomial of $N$ ). If $T$ is a strongly $2\ell$ -mixing invertible transformation of $(X,{\mathcal{B}},\mu)$ then (2.12) holds true for any bounded measurable functions $f_{j},\,j=1,...,\ell$ .

Observe that both conditions that the polynomials $P_{j}$ are essentially distinct and nontrivially depend on $n$ are important for Theorem 2.9 to hold true. As to the first condition consider $\frac{1}{N}\sum_{n=1}^{N}T^{n}fT^{n+1}g=\frac{1}{N}\sum_{n=1}^{N}T^{n}(fTg)$ which by the $L^{2}$ ergodic theorem converges as $N\to\infty$ to $\int fTgd\mu$ which usually differs from the product of integrals of $f$ and $g$ . As to the second condition we can consider $\frac{1}{N}\sum_{n=1}^{N}T^{N}f=T^{N}f$ which does not converges at all as $N\to\infty$ unless $f$ is a constant $\mu$ -almost everywhere. It would be natural to try to show that for $\ell\geq 2$ strong mixing (i.e. 2-mixing) is not enough, in general, for Theorem 2.9 to hold true but this is not easy since then we would have to construct an example of a 2-mixing but not $2\ell$ -mixing transformation which is a version of the old open problem attributed to Rokhlin.

We observe that such dynamical systems as topologically mixing subshifts of finite type, Axiom A diffeomorphisms and expanding transformations considered with an invariant Gibbs measure constructed by a Hölder continuous function (potential) are strong mixing of all orders so the above theorem is applicable for them. This is also true for the Gauss map $Tx=1/x$ mod 1, $x\in(0,1),\,T0=0$ considered with its Gauss invariant measure $\mu({\Gamma})=\frac{1}{\ln 2}\int_{\Gamma}\frac{dx}{1+x}$ , as well as some other maps of the interval. Actually, mixing of all orders follows from the property called in probability ${\alpha}$ -mixing (or strong mixing) and the above dynamical systems have this property (and even stronger property called $\psi$ -mixing with exponential speed, see, for instance, [7], [18] and [8]).

These notions are defined via two parameter families of ${\sigma}$ -algebras ${\mathcal{F}}_{mn}\subset{\mathcal{F}}$ , $-\infty<m\leq n<\infty$ on a probability space $(X,{\mathcal{F}},P)$ such that ${\mathcal{F}}_{mn}\subset{\mathcal{F}}_{m^{\prime}n^{\prime}}$ if $m^{\prime}\leq m\leq n\leq n^{\prime}$ . We define also ${\mathcal{F}}_{mn}$ for $m=-\infty$ and $n<\infty$ , for $m>-\infty$ and $n=\infty$ or for $m=-\infty$ and $n=\infty$ as minimal ${\sigma}$ -algebras containing ${\mathcal{F}}_{kn}$ for all $k>-\infty$ , containing ${\mathcal{F}}_{ml}$ for all $l<\infty$ or containing ${\mathcal{F}}_{kl}$ for all $-\infty<k\leq l<\infty$ , respectively. Such family of ${\sigma}$ -algebras is called ${\alpha}$ -mixing if

[TABLE]

Now, we have the following result which is probably well known but for readers’ convenience we provide details here.

2.10 Proposition.

Suppose that $\{{\mathcal{F}}_{mn},\,-\infty\leq m\leq n\leq\infty\}$ is an ${\alpha}$ -mixing family of ${\sigma}$ -algebras on a probability space $(X,{\mathcal{F}},\mu)$ with ${\mathcal{F}}={\mathcal{F}}_{-\infty,\infty}$ . Let $T:X\to X$ be a measure $\mu$ -preserving transformation such that $T^{-1}{\mathcal{F}}_{m,n}\subset{\mathcal{F}}_{m+1,n+1}$ for all $m\leq n$ . Then for any ${\Gamma}_{1},...,{\Gamma}_{k}\in{\mathcal{F}},\,k\geq 2$ ,

[TABLE]

Proof.

First, observe that if $G_{i}\in{\mathcal{F}}_{m_{i}n_{i}},\,i=1,...,k$ with $m_{i}\leq n_{i}<m_{i+1},\,i=1,...,k-1,$ then applying the definition of the mixing coefficient ${\alpha}$ subsequently we obtain that

[TABLE]

Next, let ${\Gamma}_{i}\in{\mathcal{F}}_{mn}$ for some $-\infty<m\leq n<\infty$ and all $i=1,...,k$ . Since $T^{-1}{\mathcal{F}}_{mn}\subset{\mathcal{F}}_{m+1,n+1}$ we obtain from (2.15) that (2.14) holds true. Now, let ${\Gamma}_{i}\in{\mathcal{F}}={\mathcal{F}}_{-\infty,\infty},\,i=1,...,k,$ be arbitrary. Then for each ${\varepsilon}>0$ there exist $m=m({\varepsilon})\leq n=n({\varepsilon})$ and $\hat{\Gamma}_{i}\in{\mathcal{F}}_{mn}$ such that $\mu({\Gamma}_{i}\triangle\hat{\Gamma}_{i})<{\varepsilon},\,i=1,...,k,$ where $\triangle$ denotes the symmetric difference. Since (2.14) holds true for $\hat{\Gamma}_{i}$ in place of ${\Gamma}_{i}\,i=1,...,k,$ we obtain that

[TABLE]

and since ${\varepsilon}>0$ is arbitrary (2.14) follows. ∎

Observe, that a typical application of the above setup is in the symbolic setup where $X$ is a sequence space, $T$ is the left shift and the ${\sigma}$ -algebras ${\mathcal{F}}_{mn}$ are generated by the cylinder sets for which the sequence elements on places from $m$ to $n$ are fixed. This can be extended to dynamical systems having appropriate symbolic representations via, for instance, Markov partitions.

3. One transformation case

In this section we will establish Theorems 2.1, 2.2 and Corollary 2.3.

3.1. Factors and extensions

The strategy of our proof is the same as in [14]. It is based on the notions of factors and extensions. Recall, that if $T$ is a measure preserving transformation of a probability space $(X,{\mathcal{B}},\mu)$ and $T^{-1}{\mathcal{B}}_{1}\subset{\mathcal{B}}_{1}\subset{\mathcal{B}}$ then $(X,{\mathcal{B}}_{1},\mu,T)$ is called a factor of $(X,{\mathcal{B}},\mu,T)$ while the latter is called an extension of $(X,{\mathcal{B}}_{1},\mu,T)$ . The latter factor is said to be nontrivial if ${\mathcal{B}}_{1}$ contains sets of measure strictly between 0 and 1. It is often more convenient to view factors in the following equivalent way (see [14] for more details). Namely, the factor $(X,{\mathcal{B}}_{1},\mu,T)$ is identified with a system $(Y,{\mathcal{D}},\nu,S)$ such that for some measurable onto map $\pi:X\to Y$ we have $\pi\mu=\nu$ , $\pi T=S\pi$ and ${\mathcal{B}}=\pi^{-1}{\mathcal{D}}$ . Furthermore, $\mu$ disintegrates into $\mu_{y},\,y\in Y$ so that $\mu=\int\mu_{y}d\nu(y)$ and $T\mu_{y}=\mu_{Sy}$ $\nu$ -almost everywhere (a.e.).

Next, let $g\in L^{2}(X,{\mathcal{B}},\mu)$ and let ${\mathcal{Y}}=(Y,{\mathcal{D}},\nu,S)$ be a factor of $(X,{\mathcal{B}},\mu,T)$ . Following [14] we set

[TABLE]

This is essentially the conditional expectation $E(g|{\mathcal{B}}_{1})$ provided $(Y,{\mathcal{D}},\nu,S)$ is identified with $(X,{\mathcal{B}}_{1},\mu,T)$ . Since ${\mathcal{B}}_{1}=\pi^{-1}{\mathcal{D}}$ and $Y=\pi X$ then $E(g|{\mathcal{B}}_{1})$ is constant on $\pi^{-1}y$ for $\nu$ -almost all $y$ , and so this conditional expectation can be viewed as a function on $Y$ . Since we refer often to [14] we will keep the notations from there though they differ slightly from the way conditional expectations with respect to ${\sigma}$ -algebras are written in probability. We will use also the following well known formulas

[TABLE]

provided $f$ and $fg$ are integrable.

Fix a measure preserving system $(X,{\mathcal{B}},\mu,T)$ and let ${\mathcal{B}}_{1}\subset{\mathcal{B}}$ be a $T$ -invariant ${\sigma}$ -subalgebra. If (2.3) holds true for any $A\in{\mathcal{B}}_{1},\,\ell$ and $p_{j},q_{j},\,j=0,1,...,\ell,$ all satisfying the conditions of Theorem 2.2 then we say that the action of $T$ on the factor $(X,{\mathcal{B}}_{1},\mu)$ is generalized Szemerédi (GSZ). To make this shorter we will also say in this case that the action of $T$ on ${\mathcal{B}}_{1}$ is GSZ and if $(X,{\mathcal{B}}_{1},\mu,T)$ is identified with $(Y,{\mathcal{D}},\nu,S)$ then this is equivalent to saying that the action of $S$ on ${\mathcal{D}}$ is GSZ.

Similarly to [14] we can see that the set of factors for which $T$ is GSZ contains a maximal element and that no proper factor can be maximal. The proof of Theorem 2.2 is based on the notions of relative weak mixing and relative compact extensions of other factors, which will be defined below. We will show that if the action $T$ is GSZ for smaller factor then it is also GSZ for a larger factor which is either relative mixing or relative compact with respect to the smaller factor. Considered together with two following facts this will yield our result. First, similarly to [14] we see that if $T$ is GSZ for a totally ordered (by inclusion) family of factors $\{{\mathcal{B}}_{\alpha}\}$ (i.e. factors $(X,{\mathcal{B}}_{\alpha},\mu)$ ) then $T$ is GSZ for $\sup_{\alpha}{\mathcal{B}}_{\alpha}$ (i.e. for $(X,\sup_{\alpha}{\mathcal{B}}_{\alpha},\mu))$ where the latter is the minimal ${\sigma}$ -algebra containing each ${\mathcal{B}}_{\alpha}$ . Secondly, we rely on the general result from [14] saying that if ${\mathcal{X}}=(X,{\mathcal{B}},\mu,T)$ is an extension of ${\mathcal{Y}}=(Y,{\mathcal{D}},\nu,S)$ , which is not relative weak mixing, then there exists an intermediate factor ${\mathcal{X}}^{*}$ between ${\mathcal{Y}}$ and ${\mathcal{X}}$ so that ${\mathcal{X}}^{*}$ is a (relative) compact extension of ${\mathcal{Y}}$ .

3.2. Relative weak mixing

Let $(Z,{\mathcal{E}},{\theta})$ be a probability space, $X=Y\times Z$ , $\mu=\nu\times{\theta}$ , ${\mathcal{B}}={\mathcal{D}}\times{\mathcal{E}}$ and $T(y,z)=(Sy,\,{\sigma}(y)z)$ where $S:Y\to Y$ preserves a probability measure $\nu$ , ${\sigma}(y)z$ is measurable in $(y,z)$ and all ${\sigma}(y),\,y\in Y$ preserve the measure ${\theta}$ . Then $T$ is measure preserving on $(X,{\mathcal{B}},\mu)$ and $(X,{\mathcal{B}},\mu,T)$ is called in [14] a skew product of $(Y,{\mathcal{D}},\nu,S)$ with $(Z,{\mathcal{D}},{\theta})$ (while usually $T$ inself is called a skew product transformation). Set $\tilde{X}=Y\times Z\times Z$ , $\tilde{\mathcal{B}}={\mathcal{D}}\times{\mathcal{E}}\times{\mathcal{E}}$ , $\tilde{\mu}=\nu\times{\theta}\times{\theta}$ and $\tilde{T}(y,z,z^{\prime})=(Sy,{\sigma}(y)z,{\sigma}(y)z^{\prime})$ . Then ${\mathcal{X}}=(X,{\mathcal{B}},\mu,T)$ is called a relative weak mixing extension of ${\mathcal{Y}}=(Y,{\mathcal{D}},\nu,S)$ if the action of $\tilde{T}$ on $(\tilde{X},\tilde{\mathcal{B}},\tilde{\mu})$ is ergodic.

3.1 Proposition.

Let $(X,{\mathcal{B}},\mu,T)$ be a relative weak mixing extension of $(Y,{\mathcal{D}},\nu,S)$ and $f_{j}\in L^{\infty}(X,{\mathcal{B}},\mu),\,j=0,1,...,\ell$ . Then for any $m=1,2,...,\ell$ ,

[TABLE]

and

[TABLE]

where $p_{j},q_{j},\,j=1,...,\ell,$ satisfy conditions of Theorem 2.2.

Proof.

The proof proceeds similarly to Theorem 8.3 in [14]. Recall, that the conditional expectations $E(f_{j}|{\mathcal{Y}})$ can be viewed as functions in both $L^{\infty}(X,{\mathcal{B}},\mu)$ and in $L^{\infty}(Y,{\mathcal{D}},\nu)$ , which is identified with $L^{\infty}(X,{\mathcal{B}}_{1},\mu)$ , and so this conditional expectation is ${\mathcal{B}}_{1}$ -measurable. Denote the assertions (3.2) and (3.3) by $A_{m}$ and $B_{m}$ , respectively, where both mean that they hold true for all relatively weak mixing extensions of $(Y,{\mathcal{D}},\nu,S)$ and all $L^{\infty}$ functions on corresponding spaces.

First, observe that $A_{0}$ is obvious and $B_{0}$ will not play a role here so we can denote by $B_{0}$ any correct assertion. Next, we proceed by induction in $m$ showing that (cf. [14]),

(i) $A_{m-1}$ implies $B_{m}$ and

(ii) $B_{m}$ for $(\tilde{X},\tilde{\mathcal{B}},\tilde{\mu},\tilde{T})$ (which is also a relative weak mixing extension of $(Y,{\mathcal{D}},\nu,S)$ ) implies $A_{m}$ for $(X,{\mathcal{B}},\mu,T)$ .

We start with (ii) which is easier. If $f_{0}$ is measurable with respect to ${\mathcal{B}}_{1}=\pi^{-1}({\mathcal{D}})$ , the integrals in (3.2) have the form

[TABLE]

where $\tilde{p}_{j}=p_{j+1}-p_{1},\,\tilde{q}_{j}=q_{j+1}-q_{1}$ still satisfy conditions of Theorem 2.2 and we use (3.1) here and that $S$ is $\nu$ -preserving. Thus $A_{m}$ follows from $A_{m-1}$ if $f_{0}$ is ${\mathcal{B}}_{1}$ -measurable (assuming the induction hypothesis for all $p_{j},q_{j}$ satisfying the conditions of Theorem 2.2).

It follows that writing $f_{0}=(f_{0}-E(f_{0}|{\mathcal{Y}}))+E(f_{0}|{\mathcal{Y}})$ and using that $(a+b)^{2}\leq 2a^{2}+2b^{2}$ we can assume that $E(f_{0}|{\mathcal{Y}})=0$ . With this the left hand side of (3.2) takes the form

[TABLE]

where $g\otimes g(y,z,z^{\prime})=g(y,z)g(y,z^{\prime})$ is a function on $\tilde{X}$ whenever $g$ is a function on $X$ (see (6.6) in [14]). By $B_{m}$ for $(\tilde{X},\tilde{\mathcal{B}},\tilde{\mu},\tilde{T})$ the above limit equals

[TABLE]

Since the sum here is ${\mathcal{B}}_{1}$ -measurable we can insert the conditional expectation inside of the integral concluding that the latter limit is zero since

[TABLE]

completing the proof of (ii).

In order to prove (i) we observe that

[TABLE]

It follows that it suffices to prove $B_{m}$ under the additional condition that for some $j_{0},\,1\leq j_{0}\leq m$ we have $E(f_{j_{0}}|{\mathcal{Y}})=0$ (replacing $f_{j_{0}}$ by $f_{j_{0}}-E(f_{j_{0}}|{\mathcal{Y}})$ ).

We now have to show that $\lim_{N\to\infty}\|\psi_{N}\|_{L^{2}}=0$ for $\psi_{N}=\frac{1}{N}\sum_{n=1}^{N}\prod_{j=1}^{m}T^{p_{j}n+q_{j}N}f_{j}$ provided $E(f_{j_{0}}|{\mathcal{Y}})=0$ . Rewrite

[TABLE]

where $H$ will be chosen large but much smaller than $N$ . By the convexity of the function ${\varphi}(x)=x^{2}$ we have (up to $O(H/N)$ ),

[TABLE]

By integration and the fact that $T$ is measure preserving,

[TABLE]

where $\hat{p}_{i}=p_{i+1}-p_{1}$ and $\hat{q}_{i}=q_{i+1}-q_{1}$ satisfy conditions of Theorem 2.2.

Set $r=k-n$ and observe that a pair $(n,k)$ appears in the above sums only if $|r|=|k-n|<H$ and then for $H-|r|$ values of $j$ we rewrite the above estimate as

[TABLE]

Inserting conditional expectation inside the integral and using $A_{m-1}$ for a fixed $H$ , every $r$ such that $|r|<H$ and $N$ large enough we can replace the integral term in the above inequality by

[TABLE]

Hence, we obtain

[TABLE]

Next, we estimate the integrals appearing in (3.4) by

[TABLE]

Since $E(f_{j_{0}}|{\mathcal{Y}})=0$ we obtain from $A_{1}$ for the case when $q_{1}=0$ , which is proved as Lemma 8.1 in [14] (where the ergodicity of $\tilde{T}$ by the definition of weak mixing extensions is used), that

[TABLE]

Hence, most of the terms in the right hand side of (3.4) are small provided that $H$ is large enough. Since all terms in the right hand side of (3.4) are bounded by $\prod_{j=1}^{m}\|f_{j}\|^{2}_{\infty}$ and most of them are small, their average in (3.4) becomes arbitrarily small when $H$ and $N$ are large enough, completing the proof of Proposition 3.1. ∎

Now Theorem 2.1 is a particular case of (3.3) considering a trivial factor ${\mathcal{Y}}$ , i.e. such that the corresponding ${\sigma}$ -algebra ${\mathcal{B}}_{1}$ contains only sets of zero or full measure. As to Theorem 2.2 we will need the following corollary of Proposition 3.1.

3.2 Corollary.

Let $(X,{\mathcal{B}},\mu,T)$ be a relative weak mixing extension of $(Y,{\mathcal{D}},\nu,S)$ . If the action of $S$ on ${\mathcal{D}}$ is GSZ, then so is the action of $T$ on ${\mathcal{B}}$ .

Proof.

The result follows immediately from (3.3) in the same way as in Theorem 8.4 from [14]. ∎

We observe that Proposition 3.1 implies also that if $(X,{\mathcal{B}},\mu,T)$ is a relative weak mixing extension of $(X,{\mathcal{B}}_{1},\mu,T)$ and (2.3) holds true for any $A\in{\mathcal{B}}_{1},\,\mu(A)>0$ with $\liminf$ taken over all $N\to\infty$ then the same is true for any $A\in{\mathcal{B}},\,\mu(A)>0$ , and so the restriction of $\liminf$ to $N\in{\mathcal{N}}_{A}$ comes not from relative weak extensions but from relative compact extensions which will be studied below.

3.3. Relative compact extensions

For brevity and following [14] we will drop here the word ”relative” and will speak about compact extensions. Recall, that $(X,{\mathcal{B}},\mu,T)$ is said to be a compact extension of $(Y,{\mathcal{D}},\nu,S)$ if there exists a set ${\mathcal{R}}\subset L^{2}(X,{\mathcal{B}},\mu)$ dense in $L^{2}(X,{\mathcal{B}},\mu)$ and such that for every ${\delta}>0$ there exist functions $g_{1},...,g_{m}\in L^{2}(X,{\mathcal{B}},\mu)$ satisfying

[TABLE]

where, again, $\mu=\int\mu_{y}d\nu(y)$ .

As explained in Section 3.1 above the proof of Theorem 2.2 will be complete after we establish the following result.

3.3 Proposition.

Let $(X,{\mathcal{B}},\mu,T)$ be a compact extension of $(Y,{\mathcal{D}},\nu,S)$ . If the action of $S$ on $(Y,{\mathcal{D}},\nu)$ is GSZ then so is the action of $T$ on $(X,{\mathcal{B}},\mu)$ .

Proof.

We will follow the proof of Theorem 9.1 from [14] with a modification at the end. For an arbitrary $A\in{\mathcal{B}}$ with $\mu(A)>0$ we have to show that (2.3) holds true. First, similarly to [14] we conclude that without loss of generality the indicator function $f={\mathbb{I}}_{A}$ of $A$ can be assumed to belong to the set ${\mathcal{R}}$ appearing in the above definition of compact extensions. We will assume for convenience that $T$ is ergodic, otherwise pass to an ergodic decomposition. Then $S$ is also ergodic. The condition $f\in{\mathcal{R}}$ is equivalent to saying that the sequence $\{T^{k}f\}_{k\in{\mathbb{Z}}}$ is totally bounded, or relatively compact, in $L^{2}(\mu_{y})$ for almost all $y$ . Since $T\mu_{y}=\mu_{Sy}$ we conclude that the total boundedness of $\{T^{k}f\}_{k\in{\mathbb{Z}}}$ in $L^{2}(\mu_{y})$ for $y$ in a set of positive measure already implies for an ergodic $S$ that $\{T^{k}f\}_{k\in{\mathbb{Z}}}$ is totally bounded in a uniform manner in $L^{2}(\mu_{y})$ for almost all $y$ .

Denote by $\oplus_{j=0}^{\ell}L^{2}(\mu_{y})$ the direct sum of $\ell+1$ copies of $L^{2}(\mu_{y})$ endowed with the norm $\|(f_{0},f_{1},...,f_{\ell})\|_{y}=\max_{j}\|f_{j}\|_{L^{2}(\mu_{y})}$ . It is clear that if $f\in{\mathcal{R}}$ then the set

[TABLE]

is totally bounded in $\oplus_{j=0}^{\ell}L^{2}(\mu_{y})$ for $\nu$ -almost all $y\in Y$ , in fact, uniformly in $y\in Y$ . We write

[TABLE]

where $(\cdot,...,\cdot)_{y}$ means that the vector function is considered on a fiber above $y\in Y$ and, recall, $f={\mathbb{I}}_{A}\in{\mathcal{R}}$ . Throwing away $\nu$ -measure zero set of $y$ ’s we can assume that uniform estimates hold true on the whole $Y$ .

Set $A_{1}=\{y:\,\mu_{y}(A)>\mu(A)/2\}=\{y:\,\mu_{y}(A)>0\}$ . Then $\nu(A_{1})>\frac{1}{2}\mu(A)$ . Indeed, this is clear if $\nu(A_{1})=1$ while if $\nu(A_{1})<1$ then

[TABLE]

and so $\frac{1}{2}\mu(A)<\nu(A_{1})(1-\frac{1}{2}\mu(A))$ . Thus, we can assume without loss of generality that $\mu_{y}(A)=0$ for all $y\not\in A_{1}$ . We consider only $y\in A_{1}$ for which the corresponding elements of ${\mathcal{L}}(\ell,f,y)$ have all nonzero components, and so these elements have norm $\geq\sqrt{\frac{1}{2}\mu(A)}$ in $L^{2}(\mu_{y})$ . The corresponding subset of ${\mathcal{L}}(m,f,y)$ is denoted by ${\mathcal{L}}^{*}(\ell,f,y)$ and it is still uniformly totally bounded. For each $y\in A_{1}$ and ${\varepsilon}>0$ let $M({\varepsilon},y)$ denote the maximum cardinality of ${\varepsilon}$ -separated sets in ${\mathcal{L}}^{*}(\ell,f,y)$ , which is a finite monotone decreasing piece-wise constant function of ${\varepsilon}$ with at most countably many jumps. Since $M({\varepsilon},y)$ is measurable as a function of $y$ there exist ${\varepsilon}_{0}<\mu(A)/10\ell$ , $\eta>0$ and $A_{2}\subset A_{1}$ with $\nu(A_{2})>0$ so that $M({\varepsilon},y)$ equals a constant $M$ for ${\varepsilon}_{0}-\eta\leq{\varepsilon}{\varepsilon}_{0}$ and $y\in A_{2}$ .

Take $y_{0}\in A_{2}$ and find integers $n_{1},...,n_{M}$ and $N_{1},...,N_{M}$ so that $\{(f,T^{p_{1}n_{j}+q_{1}N_{j}}f,...,T^{p_{\ell}n_{j}+q_{\ell}N_{j}}f)\},\,j=1,2,...,M,$ is a maximal ${\varepsilon}_{0}$ -separated set in ${\mathcal{L}}^{*}(\ell,f,y_{0})$ . Next, $\|T^{p_{l}n_{i}+q_{l}N_{i}}f-T^{p_{l}n_{j}+q_{l}N_{j}}f\|_{L^{2}(\mu_{y})}$ , $1\leq i<j\leq M$ , $l=0,1,...,\ell,$ as functions on $Y$ are measurable and $y_{0}$ can be chosen so that each neighborhood of values of these functions at $y_{0}$ occurs with positive measure in the set $A_{2}$ . Let now $A_{3}$ be the subset of $A_{2}$ of points $y$ such that

[TABLE]

for any $i,j,l$ with $1\leq i\leq j\leq M$ and $0\leq l\leq\ell$ . Then $\nu(A_{3})>0$ by the choice of $y_{0}$ .

Now we use the assumption that the action of $S$ on $(Y,{\mathcal{D}},\nu)$ is GSZ, applying it to $A_{3}$ . Let $n,N\in{\mathbb{Z}}$ , $n\leq N$ be such that

[TABLE]

and let $y\in\bigcap_{l=0}^{\ell}S^{-(p_{l}n+q_{l}N)}A_{3}$ . Since $S^{p_{l}n+q_{l}N}y\in A_{3}$ for $l=0,1,...,\ell,$ and $A_{3}\subset\bigcap_{l=0}^{\ell}S^{-(p_{l}n_{j}+q_{l}N_{j})}A_{1}$ for $j=1,...,M$ by the definition of ${\mathcal{L}}^{*}(\ell,f,y)$ (together with (3.6)) then $S^{p_{l}(n_{j}+n)+q_{l}(N_{j}+N)}y\in A_{1}$ for $l=0,1,...,\ell$ and $j=1,...,M$ .

Similarly to [14] we conclude that the vectors $\{(f,T^{p_{1}(n+n_{j})+q_{1}(N+N_{j})}f,T^{p_{2}(n+n_{j})+q_{2}(N+N_{j})}f,...,T^{p_{\ell}(n+n_{j})+q_{\ell}(N+N_{j})}f),\,j=1,...,M,\}$ are ${\varepsilon}_{0}-\eta$ separated in ${\mathcal{L}}^{*}(\ell,f,y)$ for $y\in\bigcap_{l=0}^{\ell}S^{-(p_{l}n+q_{l}N)}A_{3}$ , and so these vectors form a maximal such set which must be then ${\varepsilon}_{0}-\eta$ dense in ${\mathcal{L}}^{*}(\ell,f,y)$ . Since $(f,f,...,f)\in{\mathcal{L}}^{*}(\ell,f,y)$ there exists $j$ such that $\{(f,T^{p_{1}(n+n_{j})+q_{1}(N+N_{j})}f,...,T^{p_{\ell}(n+n_{j})+q_{\ell}(N+N_{j})}f)\}$ is ${\varepsilon}_{0}$ -close to it. By the choice of ${\varepsilon}_{0}$ this implies

[TABLE]

The index $j$ depends on $y$ , so now we sum over $j$ to obtain that for each $y\in\bigcap_{l=0}^{\ell}S^{-(p_{l}n+q_{l}N)}A_{3}$ ,

[TABLE]

Integrating over $\bigcap_{l=0}^{\ell}S^{-(p_{l}n+q_{l}N)}A_{3}$ we derive

[TABLE]

Now we sum in $n$ , $1\leq n\leq N$ and multiply by $\frac{1}{N}$ ,

[TABLE]

Next, set $K_{j}(N)=N+N_{j}$ . Then

[TABLE]

Now we use the assumption that the action of $S$ on $(Y,{\mathcal{D}},\nu)$ is GSZ which implies that

[TABLE]

where ${\mathcal{N}}$ is an infinite set of positive integers with bounded gaps. Define ${\mathcal{N}}_{j}={\mathcal{N}}+N_{j}=\{N+N_{j}:\,N\in{\mathcal{N}}\}$ , $j=1,...,M,$ which are also sets with bounded gaps. Clearly, (3.9) implies that there exists ${\varepsilon}>0$ such that for any $N\in{\mathcal{N}}$ large enough

[TABLE]

Then by (3.7) and (3.8) we obtain that for any $N\in{\mathcal{N}}$ large enough

[TABLE]

Let

[TABLE]

Then by (3.10) for any $N\in{\mathcal{N}}$ large enough there exists $j$ such that $N+N_{j}\in{\mathcal{N}}_{A}$ . Hence, the gaps in ${\mathcal{N}}_{A}$ are bounded by the bound on gaps of ${\mathcal{N}}$ plus $2\max_{1\leq j\leq M}N_{j}$ and, clearly,

[TABLE]

This completes the proof of Proposition 3.3, as well, as of Theorem 2.2. ∎

4. Commuting transformations

In this section we will obtain Theorems 2.4, 2.5 and Corollary 2.6.

4.1. Factors and extensions with respect to an abelian group of

transformations

Let $G$ be a commutative group of transformations acting on $(X,{\mathcal{B}})$ so that all $T\in G$ preserve a probability measure $\mu$ on $(X,{\mathcal{B}})$ . A probability space $(Y,{\mathcal{D}},\nu)$ is called a factor of $(X,{\mathcal{B}},\nu)$ if there exists an onto map $\pi:X\to Y$ such that $\pi\mu=\nu$ and $\pi^{-1}{\mathcal{D}}={\mathcal{B}}$ . Define the action of $G$ on $(Y,{\mathcal{D}},\nu)$ by $T\pi x=\pi Tx$ for each $T\in G$ and $x\in X$ . This action preserves the measure $\nu$ and we say that the system $(X,{\mathcal{B}},\mu,G)$ is an extension of $(Y,{\mathcal{D}},\nu,G)$ and the latter is called a factor of the former. Clearly, this definition is compatible with the one given for one transformation in Section 3.1.

Next, $(X,{\mathcal{B}},\mu,G)$ is called a relative weak mixing extension of $(Y,{\mathcal{D}},\nu,G)$ if $(X,{\mathcal{B}},\mu,T)$ is a relative weak mixing extension of $(Y,{\mathcal{D}},\nu,T)$ for each $T\in G,\,T\neq\mbox{id}$ as defined in Section 3.2. Furthermore, $(X,{\mathcal{B}},\mu,T)$ is called a (relative) compact extension of $(Y,{\mathcal{B}},\nu,G)$ if (3.5) holds true simultaneously for all $T\in G$ (with the same ${\mathcal{R}},{\delta}$ and $g_{1},...,g_{m}$ ) for $\nu$ -almost all $y\in Y$ . Finally, following [11] we call an extension ${\alpha}:\,(X,{\mathcal{B}},\mu,G)\to(Y,{\mathcal{D}},\nu,G)$ primitive if $G$ is the direct product of two subgroups $G=G_{c}\times G_{w}$ where $(X,{\mathcal{B}},\mu,G_{c})$ is a compact and $(X,{\mathcal{B}},\mu,G_{w})$ is a relative weak mixing extensions of $(Y,{\mathcal{D}},\nu,G_{c})$ and of $(Y,{\mathcal{D}},\nu,G_{w})$ , respectively.

Next, ${\mathcal{X}}=(X,{\mathcal{B}},\mu,G)$ as above will be called GSZ if (2.8) holds true for any $A\in{\mathcal{B}}$ with $\mu(A)>0$ and all $T_{j},\hat{T}_{j}\in G$ , $j=0,1,...,\ell,$ where the set ${\mathcal{N}}_{A}$ depends on $A$ and $T_{j},\hat{T}_{j}$ ’s, $T_{0}=\hat{T}_{0}=\mbox{id}$ and $T_{1},...,T_{\ell}$ are distinct and different from the identity. Next, we rely on the Theorem 6.17 in [11] describing the structure of extensions and show similarly to Proposition 7.1 in [11] that if each $(X,{\mathcal{B}}_{\beta},\mu,G)$ is GSZ for totally ordered (by inclusion) family of ${\sigma}$ -algebras then $(X,\sup_{\beta}{\mathcal{B}}_{\beta},\mu,G)$ is also GSZ. It follows that in order to establish Theorem 2.5 it suffices to show that any primitive extension $(X,{\mathcal{B}},\mu,G)$ of $(Y,{\mathcal{D}},\nu,G)$ is GSZ provided $(Y,{\mathcal{D}},\nu,G)$ is GSZ itself.

4.2. Weak mixing extensions

The following result generalizes Proposition 3.1 to the case of several commuting transformations.

4.1 Proposition.

Suppose that $(X,{\mathcal{B}},\mu,G)$ is a relative weak mixing extension of $(Y,{\mathcal{D}},\nu,G)$ where $G$ is a commutative group of (both $\mu$ and $\nu$ ) measure preserving transformations as above. Let $T_{1},...,T_{\ell}\in G$ be distinct and different from identity while $\hat{T}_{1},...,\hat{T}_{\ell}$ be invertible (both $\mu$ and $\nu$ ) measure preserving transformations of $(X,{\mathcal{B}},\mu)$ leaving ${\mathcal{Y}}=(Y,{\mathcal{D}},\nu)$ invariant and commuting with each other and with $T_{1},...,T_{\ell}$ . Then for each $m\geq 1$ ,

[TABLE]

where $T_{0}=\hat{T}_{0}=\mbox{id}$ , and

[TABLE]

Proof.

First, observe that considering a weak mixing extension of a trivial factor we conclude that (4.2) implies Theorem 2.4. Denote the assertions (4.1) and (4.2) by $A_{m}$ and $B_{m}$ , respectively, and prove them by induction showing that

(i) $A_{m-1}$ implies $B_{m}$ and

(ii) $B_{m}$ (for $\tilde{X},\tilde{\mathcal{B}},\tilde{\mu})$ and $\tilde{T}_{j},\tilde{\hat{T}}_{j},\,j=1,...,\ell$ ) implies $A_{m}$ (for $(X,{\mathcal{B}},\mu)$ and $T_{j},\hat{T}_{j},\,j=1,...,\ell$ ) where $\tilde{X},\tilde{\mathcal{B}},\tilde{\mu}$ and $\tilde{T}$ where defined in Section 3.2.

First, observe that $A_{0}$ is obvious and $B_{0}$ does not play role here so we can denote by it any valid assertion. The proof proceeds essentially in the same way as for one transformation. We start with (ii) which is easier. As in the one transformation case we assume first that $f_{0}$ is ${\mathcal{B}}_{1}=\pi^{-1}({\mathcal{D}})$ -measurable. Then the integrals in (4.1) have the form

[TABLE]

where $\bar{T}_{j}=T_{j+1}T_{1}^{-1}$ and $\bar{\hat{T}}_{j}=\hat{T}_{j+1}\hat{T}_{1}^{-1}$ . Thus $A_{m}$ follows from $A_{m-1}$ if $f_{0}$ is ${\mathcal{B}}_{1}$ -measurable. Hence, as in Section 3.2 we can assume that $E(f_{0}|{\mathcal{Y}})=0$ . Then the left hand side of (4.1) takes the form

[TABLE]

By $B_{m}$ for $(\tilde{X},\tilde{\mathcal{B}},\tilde{\mu})$ and $\tilde{T}_{j},\tilde{\hat{T}}_{j},\,j=1,...,m,$ the above limit equals

[TABLE]

Since the sum here is ${\mathcal{B}}_{1}$ -measurable we can insert the conditional expectation inside of the integral concluding as in Section 3.2 that the latter limit is zero completing the proof of (ii).

In order to prove (i) we observe that

[TABLE]

This enables us to prove $B_{m}$ under the additional condition that for some $j_{0},\,1\leq j_{0}\leq m$ we have $E(f_{j_{0}}|{\mathcal{Y}})=0$ (replacing $f_{j_{0}}$ by $f_{j_{0}}-E(f_{j_{0}}|{\mathcal{Y}})$ ).

It remains to show that $\lim_{N\to\infty}\|\psi_{N}\|_{L^{2}}=0$ for $\psi_{N}=\frac{1}{N}\sum_{n=1}^{N}\prod_{j=1}^{m}T_{j}.\hat{T}_{j}^{N}f_{j}$ provided $E(f_{j_{0}}|{\mathcal{Y}})=0$ . Rewrite

[TABLE]

where $H$ will be chosen large but much smaller than $N$ . By convexity of the function ${\varphi}(x)=x^{2}$ we have (up to $O(H/N)$ ),

[TABLE]

Integrating the above inequality we obtain

[TABLE]

where $\bar{T}_{i}=T_{i+1}T_{1}^{-1}$ , $\bar{\hat{T}}_{i}=\hat{T}_{i+1}\hat{T}_{1}^{-1}$ and we observe that $\bar{T}_{i},\,i=1,...,m-1,$ remain distinct and different from the identity. Writing $r=k-n$ we conclude similarly to Section 3.2 that this inequality implies that

[TABLE]

Inserting conditional expectation inside the integral in the right hand side of (4.3) and using $A_{m-1}$ for a fixed $H$ , every $r$ such that $|r|<H$ and $N$ large enough we can replace the integral term in the above inequality by

[TABLE]

which gives

[TABLE]

Next, we estimate the integrals appearing in (4.4) by

[TABLE]

Since we assume that $E(f_{j_{0}}|{\mathcal{Y}})=0$ then by $A_{1}$ for the case when $\hat{T}_{1}=\mbox{id}$ which is proved as Lemma 8.1 in [14] (where ergodicity of $\tilde{T}_{j_{0}}$ is used which we know from the definition of relative weak mixing),

[TABLE]

The concluding argument is the same as in Proposition 3.1 which yields $A_{m}$ and completes the proof of Proposition 4.1. ∎

4.3. Primitive extensions

Let ${\alpha}:\,{\mathcal{X}}=(X,{\mathcal{B}},\mu,G)\to{\mathcal{Y}}=(Y,{\mathcal{D}},\nu,G)$ be a primitive extension, so that $G=G_{c}\times G_{w}$ with ${\alpha}:\,(X,{\mathcal{B}},\mu,G_{c})\to(Y,{\mathcal{D}},\nu,G_{c})$ and ${\alpha}:\,(X,{\mathcal{B}},\mu,G_{w})\to(Y,{\mathcal{D}},\nu,G_{w})$ are relative compact and weak mixing extensions, respectively. Here $G$ is supposed to be a finitely generated free abelian group and $\mu=\int\mu_{y}d\nu(y)$ . It follows from Proposition 4.1 that

4.2 Lemma.

Let $S_{1},...,S_{m}\in G_{w}$ be distinct and different from the identity, $\hat{S}_{1},...,\hat{S}_{m}\in G$ be arbitrary and $f\in L^{\infty}(X)$ . Define $\psi(y)=\int fd\mu_{y}$ . Then for each ${\varepsilon},{\delta}>0$ the number $\#{\mathcal{N}}_{{\varepsilon},{\delta},N}$ of elements of the set

[TABLE]

satisfies

[TABLE]

denoting by $\#{\Gamma}$ the cardinality of a set ${\Gamma}$ .

Proof.

Since $\psi=E(f|{\mathcal{Y}})$ then by Proposition 4.1,

[TABLE]

and (4.5) follows. ∎

The implications of compactness which will be needed below are summarized in the following lemma (see Lemma 7.10 in [11]).

4.3 Lemma.

Let $A\in{\mathcal{B}}$ with $\mu(A)>0$ . Then we can find a measurable set $A^{\prime}\subset A$ with $\mu(A^{\prime})$ as close to $\mu(A)$ as we like and such that for any ${\varepsilon}>0$ there exist a finite set of functions $g_{1},...,g_{K}\in{\mathcal{H}}=L^{2}(X,{\mathcal{B}},\mu)$ and a measurable function $k:\,Y\times G_{c}\to\{1,...,K\}$ with the property that $\|R{\mathbb{I}}_{A^{\prime}}-g_{k(y,R)}\|_{y}<{\varepsilon}$ for $\nu$ almost all $y\in Y$ and every $R\in G_{c}$ .

We will need also the following consequence of the multidimensional van der Waerden theorem.

4.4 Lemma.

(i) Let the number $K$ be given and let $T_{1},T_{2},...,T_{H}\in G$ . There is a finite subset $\Psi\subset G$ and a number $M<\infty$ such that for any map $k:\,G\to\{1,2,...,K\}$ there exist $T^{\prime}\in\Psi$ and $m\in{\mathbb{N}},\,1\leq m\leq M$ such that

[TABLE]

(ii) Let the number $K$ be given and $T_{j},\hat{T}_{j}\in G,\,j=1,...,H$ . There is a finite set $\Psi\subset G$ and a number $M<\infty$ such that for any map $k:\,G\times G\to\{1,2,...,K\}$ satisfying $k(T,S)=\hat{k}(TS)$ for some $\hat{k}:\,G\to\{1,2,...,K\}$ there exist $T^{\prime}\in\Psi$ and $m\in{\mathbb{N}},\,1\leq m\leq M$ such that

[TABLE]

Proof.

The assertion (i) is Lemma 7.11 in [11]. In order to prove (ii) we apply (i) with $\hat{k}$ and $S_{i}=T_{i}\hat{T}_{i},\,i=1,...,H,$ in place of $k$ and $T_{1},...,T_{H}$ , respectively, there. With $\Psi$ and $T^{\prime}$ given by (i) for such $\hat{k}$ and $S_{i}$ ’s we obtain

[TABLE]

for $i=1,...,H$ . ∎

The following is the main result of this section which, as explained in Section 4.1, yields Theorem 2.5.

4.5 Proposition.

Let ${\alpha}:\,{\mathcal{X}}=(X,{\mathcal{B}},\mu,G)\to{\mathcal{Y}}=(Y,{\mathcal{D}},\nu,G)$ be a primitive extension and ${\mathcal{Y}}$ be a GSZ system. Then ${\mathcal{X}}$ is also a GSZ system.

Proof.

We proceed similarly to Proposition 7.12 in [11] adapting the proof there to our situation. Let $A\in{\mathcal{B}}$ with $\mu(A)>0$ and let $T_{1},...,T_{\ell},\hat{T}_{1},...,\hat{T}_{\ell}\in G$ . Replacing $A$ by a slightly smaller set, we can assume that ${\mathbb{I}}_{A}$ has the compactness property described in Lemma 4.3. Writing $\mu(A)=\int\mu_{y}(A)d\nu(y)$ , we see that there exists a measurable subset $B\subset Y,\,\nu(B)>0$ with $\mu_{y}(A)>a=\mu(A)/2$ for all $y\in B$ . We express $T_{j},\hat{T}_{j}$ as products of elements in $G_{c}$ and in $G_{w}$ and assume without loss of generality that for all $n\leq N$ ,

[TABLE]

where $R_{1}=\hat{R}_{1}=\mbox{id}$ , $R_{i},\hat{R}_{i}\in G_{c},\,i=1,...,r$ , $S_{j},\hat{S}_{j}\in G_{w},\,j=1,...,s,$ and $S_{1},...,S_{s}$ are distinct. Since the set of transformations in the right hand side of (4.6) is at least as large as the one in the left hand side of (4.6) then (2.8) will follow if we prove that for an infinite syndetic set ${\mathcal{N}}_{A}\subset{\mathbb{N}}$ ,

[TABLE]

Let $a_{1}<a^{s}$ . We will show that there exist an infinite syndetic set ${\mathcal{N}}_{A}\subset{\mathbb{N}}$ and ${\varepsilon}>0$ such that for each $N\in{\mathcal{N}}_{A}$ there exist a subset $P_{N}\subset\{1,2,...,N\}$ with $\#P_{N}\geq{\varepsilon}N$ and $\eta>0$ such that for every $n\in P_{N}$ we can find a set $B_{n,N}\subset Y$ , $B_{n,N}\in{\mathcal{D}}$ with $\nu(B_{n,N})>\eta$ satisfying

[TABLE]

Integrating the inequality (4.8) over $B_{n,N}$ and taking into account (4.6) we obtain that for any $N\in{\mathcal{N}}_{A}$ ,

[TABLE]

and both (4.7) and (2.8) will follow.

The set $B_{n,N}$ will be determined by two requirements. For $a_{1}<a_{2}<a^{s}$ we will require that

[TABLE]

whenever $n\in P_{N}$ and $y\in B_{n,N}$ . Choose ${\varepsilon}_{1}>0$ such that if

[TABLE]

(where $\triangle$ denotes the symmetric difference) then (4.9) implies (4.8). Then we require that (4.10) holds true for any $n\in P_{N}$ and $y\in B_{n,N}$ .

Suppose now that $P_{N}$ and $\{B_{n,N},\,n\in P_{N},\,N\in{\mathcal{N}}_{A}\}$ have been found so that (4.10) is satisfied for all $n\in P_{N}$ , $y\in B_{n,N}$ and, in addition,

[TABLE]

Now, applying Lemma 4.2 with $f={\mathbb{I}}_{A},\,{\varepsilon}<a^{s}-a_{2}$ and ${\delta}<\frac{1}{2}\eta$ we obtain

[TABLE]

with $\psi$ defined in Lemma 4.2, for all $y\in B_{n,N}$ except for a set $\hat{B}_{n,N}$ of $y$ ’s of measure $\nu$ less than $\frac{1}{2}\eta$ and for $n\not\in{\mathcal{N}}_{{\varepsilon},{\delta},N}$ . Set $\tilde{P}_{N}=P_{N}\setminus{\mathcal{N}}_{{\varepsilon},{\delta},N}$ and $\tilde{B}_{n,N}=B_{n,N}\setminus\hat{B}_{n,N}$ then considering new $P_{n}=\tilde{P}_{N}$ and $B_{n,N}=\tilde{B}_{n,N}$ we obtain (4.9). The problem is reduced to finding $P_{N}$ and $B_{n,N}$ such that (4.10) and (4.11) are satisfied.

Next, we replace (4.10) by the requirement that there exists $g\in{\mathcal{H}}_{y}=L^{2}(X,{\mathcal{B}},\mu_{y})$ such that

[TABLE]

(where $\|\cdot\|_{y}=\|\cdot\|_{L^{2}(X,\mu_{y})}$ ) with ${\varepsilon}_{2}<\frac{1}{2}\sqrt{{\varepsilon}_{1}}$ . Since $R_{1}=\hat{R}_{1}=\mbox{id}$ we will have

[TABLE]

which gives (4.10) since

[TABLE]

Now recall that $A$ was chosen to comply with conditions of Lemma 4.3. We can therefore find $g_{1},g_{2},...,g_{K}\in L^{2}(X,\mu)$ and a function $k:\,Y\times G_{c}\to\{1,2,...,K\}$ so that $\|R{\mathbb{I}}_{A}-g_{k(y,R)}\|_{y}<{\varepsilon}_{2}$ for every $R\in G_{c}$ and $\nu$ -almost all $y$ . We define now a sequence of functions $k_{q,Q}:\,Y\times G\times G\to\{1,2,...,K\}$ by

[TABLE]

for integers $1\leq q\leq Q$ and transformations $R,\hat{R}\in G_{c},\,S,\hat{S}\in G_{w}$ . This is well defined since $G=G_{c}\times G_{w}$ is a direct product. Then for $\nu$ -almost all $y$ ,

[TABLE]

Fix $q\leq Q$ and $y$ for which (4.13) holds true and apply Lemma 4.4(ii) to the function $k(\cdot,\cdot)=k_{q,Q}(y,\cdot,\cdot)$ on $G\times G$ . Independently of $q,Q$ and $y$ there is a finite set $\Psi\subset G$ and a number $M$ such that $k_{q,Q}(y,T^{\prime}R_{i}^{m}S_{j}^{m},\hat{R}^{m}_{i}\hat{S}^{m}_{i})$ takes on the same value $k$ for $1\leq i\leq r$ , $1\leq j\leq s$ , for some $T^{\prime}\in\Psi$ and some $m$ with $1\leq m\leq M$ . Then if $T^{\prime}=R^{\prime}S^{\prime}$ and $g_{(q,y)}$ is the corresponding $g_{k}$ we obtain from (4.13) for $1\leq i\leq r$ , $1\leq j\leq s$ that

[TABLE]

where we took into account that $g_{(q,y)}=g_{k}=g_{k_{q,Q}(y,(R^{\prime}R_{i}^{m})(S^{\prime}S_{j}^{m}),\hat{R}^{m}_{i}\hat{S}_{i}^{m})}$ . We have shown that for every $q=1,...,Q$ , $Q\in{\mathbb{N}}$ and $\nu$ -almost all $y\in Y$ there exist $m$ and $T^{\prime}$ , both having a finite range of possibilities, such that (4.12) is satisfied with $n=qm$ and $N=Qm$ for $(T^{\prime})^{q}y$ in place of $y$ .

Next, we will produce the set $P_{N}$ and the sets $B_{n,N},\,n\in P_{N}$ such that both (4.11) and (4.12) are satisfied for $(y,n),\,y\in B_{n,N}$ . For each $q$ form the set

[TABLE]

where the intersection is taken over $j,m,T^{\prime}$ with $1\leq j\leq s,\,1\leq m\leq M,\,T^{\prime}\in\Psi$ . Using the fact that $(Y,{\mathcal{D}},\nu)$ is a GSZ system we conclude that for each $Q$ from an infinite syndetic set ${\mathcal{N}}^{\prime}\subset{\mathbb{N}}$ there exists $P^{\prime}_{Q}\subset\{1,...,Q\}$ with $\#P^{\prime}_{N}\geq{\varepsilon}Q$ for some ${\varepsilon}>0$ independent of $Q$ and such that $\nu(C_{q})>\eta^{\prime}$ for some $\eta^{\prime}>0$ and all $q\in P^{\prime}_{Q}$ .

Now let $y\in C_{q}$ for $q\in P^{\prime}_{Q}$ . There exist $m=m(q,y)$ and $T^{\prime}=T^{\prime}(q,y)$ such that $(T^{\prime})^{q}y$ (in place of $y$ ) satisfies (4.12) for $n=qm,\,N=Qm$ and $q\leq Q$ . In addition, $(T^{\prime})^{q}y$ also satisfies (4.11) for these $T^{\prime}$ and $m$ taking into account that by the definition of $C_{q}$ this condition is satisfied with $n=mq,\,N=mQ$ by all $(T^{\prime})^{q}y$ and all $m$ such that $T^{\prime}\in\Psi,\,1\leq m\leq M$ since $(T^{\prime})^{q}y\in\bigcap_{j,m}S_{j}^{-mq}\hat{S}_{j}^{-mQ}B$ whenever $y\in C_{q}$ , and so $S_{j}^{mq}\hat{S}_{j}^{mQ}(T^{\prime})^{q}y\in B$ .

Let $J$ be the total number of possibilities for $(m,T^{\prime})$ . Then for a subset $D_{q}\subset C_{q}$ with $\nu(D_{q})>\frac{\eta^{\prime}}{J}$ , $m(q,y)$ and $T^{\prime}(q,y)$ take a constant value, say, $m(q)$ and $T^{\prime}(q)$ , respectively. We now define $n(q)=qm(q)$ and set $P_{Q}=\{n(q)\leq Q:\,q\in P^{\prime}_{Q}\}$ and $B_{n(q)}=(T^{\prime}(q))^{q}D_{q}$ . Then $\nu(B_{n(q)})=\nu(D_{q})>\eta^{\prime}/J$ , $S_{j}^{n(q)}\hat{S}_{j}^{m(q)Q}B_{n(q)}\in B,\,j=1,...,s,$ and

[TABLE]

for $y\in B_{n(q)},\,1\leq i\leq r,\,1\leq j\leq s$ for an appropriately defined $g^{\prime}_{(g,y)}$ . Finally, $\#\{n(q),\,q\leq Q\}\geq{\varepsilon}/M$ and the gaps of the set $\{m(q)Q,\,Q\in{\mathcal{N}}^{\prime}\}$ are bounded by $M$ times of the maximal gap of ${\mathcal{N}}^{\prime}$ . This complets the proof of Proposition 4.5, as well as of Theorem 2.5. ∎

5. Short proofs of Theorems 2.2 and 2.5

Recall that $F_{k}\subset{\mathbb{Z}}^{d},\,k=1,2,...$ is called a Følner sequence if the cardinality of the symmetric difference $(\bar{n}+F_{k})\triangle F_{k}$ is o $(|F_{k}|)$ as $k\to\infty$ for any $\bar{n}\in{\mathbb{Z}}^{d}$ . Now, suppose that for any Følner sequence $F_{k}\subset{\mathbb{Z}}^{2}$ , $k=1,2,...$ ,

[TABLE]

(in fact, we will need this only when $F_{k}$ ’s are squares). Then there exists ${\varepsilon}>0$ and an integer $M\geq 1$ such that in any square $R\subset{\mathbb{Z}}^{2}$ with the side of length $M$ we can find $(n,m)\in R$ such that $a_{n,m}>{\varepsilon}$ . Indeed, if this were not true then we could find a sequence of squares $R_{j}\subset{\mathbb{Z}}^{2}$ with sides of length $M_{j}\to\infty$ as $j\to\infty$ and a sequence ${\varepsilon}_{j}\to 0$ as $j\to\infty$ such that $a_{n,m}\leq{\varepsilon}_{j}$ for all $(n,m)\in R_{j}$ . Then, of course,

[TABLE]

which contradicts our assumption since $\{R_{j}\}_{j=1}^{\infty}$ is a Følner sequence. Clearly, this argument remains true for any ${\mathbb{Z}}^{d}$ replacing squares by $d$ -dimensional boxes but we will not need this here.

Now, let $M,{\varepsilon}>0$ be numbers whose existence was established above and assume that $a_{n,m}\geq 0$ for all integer $n$ and $m$ . Set $Q_{j}=\{(n,m):\,j(M+1)\leq m<(j+1)(M+1)$ and $0<n\leq j(M+1)\}$ . Then $Q_{j}$ contains $j$ disjoint squares with the side of length $M$ , and so

[TABLE]

Hence, there exists $j(M+1)\leq N_{j}<(j+1)(M+1)$ such that

[TABLE]

Clearly, ${\mathcal{N}}=\{N_{j},\,j=1,2,...\}$ is a set of integers with gaps bounded by $2M$ and

[TABLE]

Next, we will apply the above arguments to the situation of Theorem 2.5. Let $T_{j},\hat{T}_{j},\,j=1,...,\ell,$ be as in Theorem 2.5 commuting measure preserving transformations of a measure space $(X,{\mathcal{B}},\mu)$ and set $S_{j}^{(n,m)}=(T_{j-1}^{n}\hat{T}_{j-1}^{m})^{-1}T_{j}^{n}\hat{T}_{j}^{m},\,j=1,...,\ell,$ with $S_{0}^{(n,m)}$ being the identity transformation. Then $S_{j}^{(n,m)},\,j=0,1,...,\ell,$ are commuting measure preserving transformations of $(X,{\mathcal{B}},\mu)$ and $T_{j}^{n}\hat{T}_{j}^{m}=S_{0}^{(n,m)}S_{1}^{(n,m)}\cdots S_{j}^{(n,m)},\,j=0,1,...,\ell$ . Now, it follows from Theorem B of [2] that for any set $A\in{\mathcal{B}}$ with $\mu(A)>0$ and any Følner sequence $F_{k}\subset{\mathbb{Z}}^{2}$ ,

[TABLE]

i.e. the limit exists and it is positive. Taking $a_{n,m}=\mu\big{(}\bigcap_{j=0}^{\ell}(T_{j}^{n}\hat{T}_{j}^{m})^{-1}A\big{)}$ we obtain by the above arguments that there exists an infinite set with bounded gaps ${\mathcal{N}}_{A}$ such that (2.8) holds true, completing the proof of Theorem 2.5. ∎

Next, we derive a polynomial version of Theorem 2.2. Replace in (2.1) the linear terms $p_{i}n+q_{i}N$ by general polynomials $p_{i}(n,N),\,i=1,...,\ell,$ taking on integer values on integer pairs $n,N$ and such that for each $k\in N$ there exists a pair $n,N$ with $p_{i}(n,N),\,i=1,...,\ell,$ all divisible by $k$ . Then by Theorem 1.4 in [6],

[TABLE]

for every $A$ with $\mu(A)>0$ and any Følner sequence $F_{k}\subset{\mathbb{Z}}^{2}$ . Set $a_{n,m}=\mu(\bigcap_{i=0}^{\ell}T^{-p_{i}(n,m)}A)$ . Then by the above argument there exists an infinite set of positive integers ${\mathcal{N}}$ with uniformly bounded gaps such that

[TABLE]

providing a polynomial version of (2.3). ∎

6. Nonconventional polynomial arrays

6.1. Proof of Theorem 2.8

We start with the proof of Theorem 2.8 which proceeds close to the proof of Theorem D in [5]. First, by changing functions $f_{j}$ we can always assume without loss of generality that $P_{ij}(0)=0,\,Q_{ij}(0)=0,\,i=1,...,\ell,\,j=1,...,k$ . If $\ell=1$ and $P_{11}(n,N)=pn$ where $p$ is an integer and $P_{1j}(n)\equiv 0$ when $j>1$ while $Q_{1j}(N)$ ’s are functions of $N$ taking on integer values on integers then for any measurable $L^{2}$ function $f$ ,

[TABLE]

since $T$ is weakly mixing, and so $T^{p}$ is weakly mixing and, in particular, ergodic, and so the result follows from the $L^{2}$ ergodic theorem.

In order to deal with the general case of Theorem 2.8 we will need the following version of the van der Corput theorem whose proof is the same as of Theorem 1.4 in [3] (see also Theorem 1.5 there), and so we refer the reader there. This follows also from uniform versions of the van der Corput theorem (see, for instance, [21]).

6.1 Lemma.

Let $\{x_{n,N}\}_{n=1}^{N},\,N=1,2,...$ be a bounded sequence of vectors in a Hilbert space such that

[TABLE]

where $\langle\cdot,\cdot\rangle$ is the inner product and $D-\lim_{h}$ denotes the limit as $h\to\infty$ outside a set of integers having zero upper density. Then

[TABLE]

where $\|\cdot\|$ is the Hilbert space norm.

Next, we will describe the ”PET induction” in our circumstances where we closely follow [5] and refer the reader there for more details. Let $P_{j},\,j=1,...,k,$ be any polynomials and $Q_{j},\,j=1,...,k$ be any functions taking on integer values on integers and such that $P_{j}(0)=Q_{j}(0)=0$ . Similarly to [5] we will call

[TABLE]

$P$ -polynomial expressions where $P$ indicates the fact that $Q_{i}$ ’s are not necessarily polynomials. Products of $P$ -polynomial expressions and their inverses are $P$ -polynomial expressions, and so they form a group $PE$ . Clearly, if $\Phi(n,N)={\varphi}(n)\psi(N)\in PE$ then $\Phi^{-1}(n_{0},N)\Phi(n+n_{0},N)={\varphi}^{-1}(n_{0}){\varphi}(n+n_{0})\in PE$ . The degree, deg $({\varphi}(n))$ of ${\varphi}(n)=T_{1}^{P_{1}(n)}\cdots T_{k}^{P_{k}(n)}$ is the maximal degree of polynomials $P_{j},\,j=1,...,k$ and the degree, deg $(\Phi(n,N))$ of a $P$ -polynomial expression $\Phi(n,N)={\varphi}(n)\psi(N)$ is defined as the degree of ${\varphi}$ . Again, following [5] we define the weight of a $P$ -polynomial expression $\Phi(n,N)={\varphi}(n)\psi(N)$ with ${\varphi}(n)=T_{1}^{P_{1}(n)}\cdots T_{k}^{P_{k}(n)}$ as the pair $(r,d)$ such that deg $P_{r+1}=...=$ deg $P_{k}(n)=0$ , deg $P_{r}(n)=d\geq 1$ . The weight $(r,c)$ is greater than $(s,d)$ if $r>s$ or if $r=s$ and $c>d$ .

Two $P$ -polynomial expressions $\Phi_{1}(n,N)={\varphi}_{1}(n)\psi_{1}(N)$ and $\Phi_{2}(n,N)={\varphi}_{2}(n)\psi_{2}(N)$ with ${\varphi}_{1}(n)=T_{1}^{P^{(1)}_{1}(n)}\cdots T_{k}^{P^{(1)}_{k}(n)}$ and ${\varphi}_{2}(n)=T_{1}^{P^{(2)}_{1}(n)}\cdots T_{k}^{P^{(2)}_{k}(n)}$ are called equivalent if they have the same weight $(r,d)$ and the leading coefficient of the polynomials $P_{r}^{(1)}$ and $P_{r}^{(2)}$ coincide, as well. Any finite subset of $PE$ is called a system and the degree of a system is the maximal degree of its elements. To every system a weight matrix $(N_{rd},\,1\leq r\leq k,\,1\leq d\leq D)$ is associated where $N_{rd}$ is the number of equivalence classes formed by the elements of the system whose weights are $(r,d)$ and $D$ is the maximal degree of the polynomials $P_{ij}$ appearing in Theorem 2.8. As in [5] we say that the weight matrix $M^{\prime}=(N^{\prime}_{rd},\,1\leq r\leq k,\,1\leq d\leq D)$ precedes the weight matrix $M=(N_{r,d},\,1\leq r\leq k,\,1\leq d\leq D)$ if for some $(r_{0},d_{0})$ , $N^{\prime}_{r_{0}d_{0}}=N_{r_{0}d_{0}}-1,\,N^{\prime}_{rd}=N_{rd}$ when $r\geq r_{0}$ and $d\geq d_{0}$ except for $r=r_{0}$ and $d=d_{0}$ , $N_{rd}=0$ and $N^{\prime}_{rd}$ are arbitrary nonnegative integers when $r\leq r_{0}$ and $d\leq d_{0}$ except for $r=r_{0}$ and $d=d_{0}$ (for a picture explanation see [5]).

Now observe that the system appearing in (6.1) has the weight matrix $M_{0}=(N_{rd})$ where $N_{11}=1$ and $N_{rd}=0$ if $(r,d)\neq(1,1)$ . Thus, (6.1) proves Theorem 2.8 for any system with the weight matrix $M_{0}$ . Next, we proceed step by step considering systems with weight matrices $M_{0},M_{1},M_{2},...,M_{K}$ such that each $M_{i}$ preceeds $M_{i+1},\,i=0,1,...,K-1$ arriving finally to the matrix $M_{K}$ with arbitrary predefined weights $N_{rd},\,1\leq r\leq k,\,1\leq d\leq D$ (for a graphical explanation of this see [5]). Our goal is to show that if Theorem 2.8 is valid for any system with the weight matrix $M_{i}$ then it is valid for any system with the weight matrix $M_{i+1}$ which by induction will yield Theorem 2.8.

Next, we remark that without loss of generality we can assume that $\int f_{i}d\mu=0$ for any $i=1,...,\ell$ which is the result of the equality

[TABLE]

Indeed, taking $a_{i}=T_{1}^{P_{i1}(n)}\cdots T_{k}^{P_{ik}(n)}\hat{T}_{1}^{Q_{i1}(N)}\cdots\hat{T}_{k}^{Q_{ik}(N)}f_{i}$ and $b_{i}=\int f_{i}d\mu$ we transform the left hand side of (2.11) into a sum of similar product expressions where all functions have zero integrals and the result to be proved now is that all corresponding limits are zero. Thus, writing

[TABLE]

we have to prove that

[TABLE]

As in [5] we can assume without loss of generality that $T_{1},...,T_{k}$ are linearly independent elements of the basis of the finitely generated free abelian group $G$ . Then ${\varphi}(n)=T_{1}^{P_{1}(n)}\cdots T_{k}^{P_{k}}=\mbox{id}$ for some polynomials $P_{1},...,P_{k}$ implies $P_{1}=\cdots=P_{k}=0$ . By Lemma 6.1, (6.4) would follow if

[TABLE]

where

[TABLE]

Next, we will need the following result.

6.2 Lemma.

Let nonconstant polynomials $P_{1}(n,N),\,P_{2}(n,N),\,...,P_{k}(n,N)$ of $n$ and $N$ be essentially distinct and nontrivially depend on $n$ . Then for each sufficiently large $h$ the polynomials $P_{1}(n,N),\,P_{2}(n,N),\,...,P_{k}(n,N),\,P_{1}(n+h,N),\,...,P_{k}(n+h,N)$ are pairwise essentially distinct (where $h$ is viewed as a constant) except for pairs $P_{i}(n,N),\,P_{i}(n+h,N)$ where $P_{i}(n,N)=p_{i}n+Q_{i}(N)$ and then $P_{i}(n+h,N)-P_{i}(n,N)=a_{i}h$ .

Proof.

Clearly, $P_{1}(n+h,N),\,...,P_{k}(n+h,N)$ are essentially distinct since this was true for $P_{1}(n,N),\,P_{2}(n,N),\,...,P_{k}(n,N)$ . It remains to show that $P_{i}(n,N)$ and $P_{j}(n+h,N)$ are essentially distinct for any $i,j=1,...,k$ provided $h$ is large enough and either $i\neq j$ or $i=j$ and $P_{i}(n,N)$ does not have the form $P_{i}(n,N)=p_{i}n+Q_{i}(N)$ . Clearly, this is true if $P_{i}$ and $P_{j}$ have different degrees in $n$ , and so we can assume that they have the same degree $d$ in $n$ . Then we can write $P_{i}(n,N)=n^{d}V_{i}(N)+n^{d-1}W_{i}(N)+r_{i}(n,N)$ and $P_{j}(n,N)=n^{d}V_{j}(N)+n^{d-1}W_{j}(N)+r_{j}(n,N)$ where $V_{i}(N)$ , $V_{j}(N)$ are nonzero while $W_{i}(N)$ , $W_{j}(N)$ are arbitrary polynomials in $N$ only and $r_{i}(n,N)$ , $r_{j}(n,N)$ are polynomials of degree less than $d-1$ in $n$ . Then $P_{j}(n+h,N)=n^{d}V_{j}(N)+n^{d-1}(W_{j}(N)+dhV_{j}(N))+\tilde{r}_{j,h}(n,N)$ where $\tilde{r}_{j,h}(n,N)$ is a polynomial whose degree in $n$ is less than $d-1$ having coefficients depending on $h$ . Since $V_{i}(N)$ is a nonzero polynomial then for any $h$ large enough $W_{i}(N)+dhV_{i}(N)\neq W_{j}(N)$ and if $d>1$ then $P_{j}(n+h,N)$ and $P_{i}(n,N)$ are essentially distinct provided $h$ is large enough. The case $d=0$ is ruled out by our assumptions. If $d=1$ and $i\neq j$ then either $V_{i}\neq V_{j}$ or $W_{i}\neq W_{j}$ and either $W_{i}$ or $W_{j}$ is nonconstant. In both of these cases $P_{j}(n+h,N)$ and $P_{i}(n,N)$ are essentially distinct. Next, if $d=1$ and $i=j$ then $P_{i}(n+h,N)-P_{i}(n,N)=hV_{i}(N)$ , and so $P_{i}(n+h,N)$ and $P_{i}(n,N)$ are essentially distinct if and only if $V_{i}$ is nonconstant concluding the proof of the lemma (where, in fact, we did not use that $P_{i}$ ’s depend polynomially on $N$ ). ∎

Observe, that if deg $({\varphi}_{i}(n))\geq 2$ , ${\varphi}_{i}(n)=T_{1}^{P_{i1}(n)}\cdots T_{k}^{P_{ik}(n)}$ then $\max_{1\leq j\leq k}$ deg $(P_{ij}(n))\geq 2$ , and it follows from Lemma 6.2 that ${\varphi}_{i}(n+h){\varphi}_{i}^{-1}(h)$ depends nontrivially on $n$ provided $h$ is large enough. Rearranging $P$ -polynomial expressions if needed, we can assume that deg $({\varphi}_{i}(n))=1$ for $i=1,...,q$ and deg $({\varphi}_{i}(n))\geq 2$ for $i=q+1,...,k$ . The condition deg $({\varphi}_{i}(n))=1$ means that $P_{ij}(n)=p_{ij}n$ for some integers $p_{ij},\,j=1,...,k.$ Hence, in this case ${\varphi}_{i}(n+h)={\varphi}_{i}(n){\varphi}_{i}(h)$ . Thus, if $\Phi_{i}(n,N)={\varphi}_{i}(n)\psi_{i}(N)$ we can write

[TABLE]

where $k^{\prime}=2k-q,\,\tilde{f}_{i}$ is either $f_{l},\,{\varphi}_{l}(h)f_{l}$ or it is $f_{l}\cdot{\varphi}_{l}(h)f_{l}$ for some $l$ between 1 and $k$ and $\tilde{\Phi}_{l}(n,N)$ is either $\Phi_{l}(n,N)$ for some $l$ between 1 and $k$ or it is $\Phi_{l}(n+h,N){\varphi}_{i}^{-1}(h)$ for some $l$ between $q+1$ and $k$ .

Consider the new system $\tilde{A}_{h}=\{\Phi_{i}(n,N),\,\Phi_{i}(n+h,N){\varphi}_{i}^{-1}(h),\,i=1,...,k\}$ and suppose, without loss of generality, that $\tilde{\Phi}_{1}(n,N)$ has the minimal weight in $\tilde{A}_{h}$ . Since all ${\varphi}_{i}(n)\neq\mbox{id}$ then $w(\tilde{\Phi}_{1}(n,N)$ is measure preserving and we can write

[TABLE]

where $\hat{\Phi}_{i}(n,N)=\tilde{\Phi}_{i}(n,N)\tilde{\Phi}_{1}^{-1}(n,N)$ . It follows from the assumptions of Theorem 2.8 that ${\varphi}_{i}(n)\not\equiv{\varphi}_{l}(n)$ and ${\varphi}_{i}(n+h)\not\equiv{\varphi}_{l}(n+h)$ for $i,l=1,...,k,\,i\neq l$ . Writing $\tilde{\Phi}_{i}(n,N)=\tilde{\varphi}_{i}(n)\tilde{\psi}_{i}(N)$ we see from here and Lemma 6.2 that $\tilde{\varphi}_{i}(n)\not\equiv\tilde{\varphi}_{l}(n)$ for $i\neq l$ and large enough $h$ . Writing $\hat{\Phi}_{i}(n,N)=\hat{\varphi}_{i}(n)\hat{\psi}_{i}(N)$ we conclude from here that $\hat{\varphi}_{i}(n)\not\equiv\mbox{id}$ and $\hat{\varphi}_{i}(n)\not\equiv\hat{\varphi}_{l}(n)$ for $i,l=2,...,k^{\prime},\,i\neq l$ for all $h$ large enough.

Introduce the new system $A_{h}=\{\hat{\Phi}_{i}(n,N),\,i=2,...,k^{\prime}\}$ . In the same way as in [5] (refering the reader for more explanations there) we conclude that the weight matrix of $A_{h}$ precedes that of $A$ . In order to invoke PET-induction we assume that Theorem 2.8 holds true for all systems whose weight matrices precede that of $A$ . Hence, we have for $A_{h}$ ,

[TABLE]

as $N\to\infty$ . Then by the Cauchy inequality

[TABLE]

Hence, by (6.6)–(6.8),

[TABLE]

If one of $P_{ij}(n),\,j=1,...,k$ is not linear then deg $(\Phi_{i}(n,N))=$ deg $({\varphi}_{i}(n))\geq 2$ and $\tilde{f}_{l}=f_{k}$ for some $l\leq k^{\prime}$ , and so the last product in (6.6) equals zero yielding

[TABLE]

Otherwise, deg $(\Phi_{i}(n,N))=$ deg $({\varphi}_{i}(n))=1$ for all $i$ and then $k^{\prime}=k$ , $\tilde{f}_{i}=f_{i}\cdot{\varphi}_{i}(h)f_{i}$ and ${\varphi}_{i}(n)=S_{i}^{n}$ for some $S_{i}\in G$ , $S_{i}\neq\mbox{id}$ . Then by weak mixing

[TABLE]

which together with (6.9) yields again (6.10) concluding the proof of Theorem 2.8 since the initial step of the induction is given by (6.1). ∎

6.2. Nonconvergence under weak mixing

Next, we will show that, in general, weak mixing of $T$ is not enough to ensure $L^{2}$ -convergence in (1.3) for general polynomials $P_{j}(n,N),\,j=1,...,\ell$ taking on integer values on integers even in the ”conventional” case $\ell=1$ . Consider the sum

[TABLE]

where $T$ is a measure preserving transformation of a separable probability space $(X,{\mathcal{B}},\mu)$ and $f$ is a bounded measurable function. Recal, that the Koopman operator $U_{T}f(x)=f(Tx)$ is unitary and it has a spectral representation in the form

[TABLE]

where $\{e^{2\pi iu},\,u\in{\Gamma}\}$ is the spectrum of $U_{T}$ and $E$ is the corresponding projection operator valued spectral measure (see, for instance, [17] or [22]). Then

[TABLE]

and so

[TABLE]

Fix a small ${\varepsilon}>0$ and for each $M\in{\mathbb{N}}$ set

[TABLE]

Observe that if $u\in{\Gamma}_{{\varepsilon},N}$ then $nu\in{\Gamma}_{n{\varepsilon},N}$ and ${\Gamma}_{{\varepsilon},N}\subset{\Gamma}_{n{\varepsilon},nN}$ . Define inductively $N_{0}=1$ and $N_{k+1}=[\frac{5N^{2}_{k}}{{\varepsilon}}],\,k=0,1,2,...$ where $[a]$ is the integral part of $a$ . Set also ${\varepsilon}_{k}=\frac{{\varepsilon}}{N_{k}},\,k=0,1,...$ . Then

[TABLE]

and ${\Gamma}_{\varepsilon}=\bigcap_{k=1}^{\infty}{\Gamma}_{{\varepsilon}_{k},N_{k}}$ is a Cantor like set, in particular, it is a perfect set and for any $k$ ,

[TABLE]

Let $\nu_{\varepsilon}$ be a continuous (non-atomic) probability measure on ${\Gamma}_{\varepsilon}$ , say, constructed in the same way as the Cantor distribution on the standard Cantor set. Next, we introduce a spectral measure $E^{({\varepsilon})}$ concentrated on ${\Gamma}_{\varepsilon}$ by the standard formula $E^{({\varepsilon})}_{U}g={\mathbb{I}}_{U}g$ for each measurable function $g$ on ${\Gamma}_{\varepsilon}$ and a measurable set $U\subset{\Gamma}_{\varepsilon}$ where ${\mathbb{I}}_{U}$ is the indicator of $U$ . The spectral measure $E^{({\varepsilon})}$ is continuous considering it on the probability space $({\Gamma}_{\varepsilon},\nu_{\varepsilon})$ since for each $u\in{\Gamma}_{\varepsilon}$ any function ${\mathbb{I}}_{\{u\}}g$ is zero $\nu_{\varepsilon}$ -almost everywhere. Next, we can find a transformation $T$ such that its Koopman operator $U_{T}{\varphi}=T{\varphi}$ has the spectral representation

[TABLE]

(see, for instance, Ch. 4 in [9]) and since $E^{({\varepsilon})}$ is a continuous spectral measure then $T$ is weakly mixing (see, for instance, [16] or [23]).

By (6.15),

[TABLE]

for any $n=1,2,...,N_{k}$ , and all $k=1,2,...$ . Hence

[TABLE]

Now, choose a function $f$ such that $\int fd\mu=0$ and $\int|f|d\mu>0$ . If the $L^{2}$ ergodic theorem holds true for the averages $\frac{1}{N}S_{N}$ then $\|\frac{1}{N_{k}}S_{N_{k}}\|_{L^{2}}\to 0$ as $k\to\infty$ which leads to the contradiction in the above inequality if ${\varepsilon}<\frac{1}{2\pi}$ . ∎

6.3. Proof of Theorem 2.9

For the proof of Theorem 2.9 we will need the following result.

6.3 Lemma.

Let $P(n,N)$ be a nonconstant polynomial of $n$ and $N$ taking on integer values on integers. Set

[TABLE]

where $|\{\cdot\}|$ denotes the cardinality of a set in brackets and if $P(n,N)=P(N)$ does not depend on $n$ then we set $M_{K}(N)=N$ if $|P(N)|\leq K$ and $M_{K}(N)=0$ , for otherwise. If $P(n,N)$ nontrivially depends on $n$ then

[TABLE]

where degn is the degree of the polynomial in $n$ considering $N$ as a constant. If $P(n,N)=P(N)$ depends only on $N$ then there exists $N_{0}$ such that $|P(N)|>K$ for all $N\geq N_{0}$ , and so $M_{K}(N)=0$ for such $N$ . In both cases $\lim_{N\to\infty}\frac{1}{N}M_{K}(N)=0$ .

Proof.

For any $k=0,\pm 1,\pm 2,...,\pm K$ there exists at most deg $P$ solutions in $n$ of the equation $P(n,N)=k$ , and so (6.17) follows. If $P(n,N)=P(N)$ is nonconstant then $|P(N)|\to\infty$ as $N\to\infty$ and the second assertion follows, as well. ∎

Next we can prove Theorem 2.9. As before, without loss of generality we can assume that, at least, one of functions $f_{j}$ has zero integral with respect to $\mu$ . Set

[TABLE]

and in order to prove Theorem 2.9 we have to show that

[TABLE]

which according to Lemma 6.1 will follow if (6.2) holds true.

Without loss of generality assume that $1,2,...,k,\,k\leq\ell$ are all indexes $j$ such that $P_{j}(n,N)=p_{j}n+Q_{j}(N)$ for some nonzero integers $p_{j}$ and polynomials $Q_{j}$ in $N$ taking on integer values on integers. Then

[TABLE]

By Lemma 6.2, $P_{1}(n,N),...,P_{\ell}(n,N);P_{k+1}(n+h,N),...,P_{\ell}(n+h,N)$ are essentially distinct polynomials, and so their pairwise differences $p_{ij}^{(1)}(n,N)=P_{i}(n,N)-P_{j}(n,N),\,p_{ij}^{(2)}(n+h,N)=P_{i}(n+h,N)-P_{j}(n+h,N),\,i,j=1,...,\ell,\,i\neq j$ and $p_{ij}^{(3)}(n,N)=P_{i}(n,N)-P_{j}(n+h,N),\,i=1,...,\ell,\,j=k+1,...,\ell$ are nonconstant polynomials of $n$ and $N$ . Since $T$ is strongly $2\ell$ -mixing then for any ${\varepsilon}>0$ and any bounded measurable functions $g_{1},...,g_{L}$ with $L\leq 2\ell$ there exists $K_{\varepsilon}>0$ such that

[TABLE]

By Lemma 6.3,

[TABLE]

where $i,j$ run over indexes appearing in the above definitions of $p_{ij}^{(l)}$ ’s. Hence, for $h$ large enough choosing $K_{\varepsilon}$ for functions $g_{j}$ equal either to some $f_{l}$ or to $f_{l}T^{p_{l}h}f_{l}$ we obtain,

[TABLE]

and since ${\varepsilon}>0$ is arbitrary we obtain that

[TABLE]

Finaly, relying on strong mixing we let $h\to\infty$ and obtain

[TABLE]

since one of integrals $\int f_{j}d\mu$ is zero, completing the proof of Theorem 2.9. ∎

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[2] T. Austin, Non-conventional ergodic averages for several commuting actions of an amenable group , J. D’Analyse Math. 130 (2016), 243–274.
3[3] V. Bergelson, Weakly mixing PET , Ergod. Th.& Dyn. Sys. 7 (1987), 337–349.
4[4] V. Bergelson, B. Host, R. Mc Cutcheon and F. Parreau, Aspects of uniformity in recurrence , Colloq. Math. 84/85 (2000), 549–576.
5[5] V. Bergelson and A. Leibman, Polynomial extensions of van der Waerden’s and Szemerédi’s theorems , J. Amer. Math. Soc. 9 (1996), 725–753.
6[6] V. Bergelson, A. Leibman and E. Lesigne Intersective polynomials and polynomial Szemerédi theorem , Adv. Math. 219 (2008), 369–388.
7[7] R. Bowen, Equilibrium States and the Ergodic Theory of Anosov Diffeomorphisms , Lecture Notes in Math. 470, Springer–Verlag, Berlin, 1975.
8[8] R.C. Bradley, Introduction to Strong Mixing Conditions, Kendrick Press, Heber City, 2007.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Ergodic Theorems for Nonconventional Arrays and an Extension

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

Acknowledgement**.**

2. Preliminaries and main results

2.1 Theorem**.**

2.2 Theorem**.**

2.3 Corollary**.**

2.4 Theorem**.**

2.5 Theorem**.**

2.6 Corollary**.**

2.7 Remark**.**

2.8 Theorem**.**

2.9 Theorem**.**

2.10 Proposition**.**

Proof.

3. One transformation case

3.1. Factors and extensions

3.2. Relative weak mixing

3.1 Proposition**.**

Proof.

3.2 Corollary**.**

Proof.

3.3. Relative compact extensions

3.3 Proposition**.**

Proof.

4. Commuting transformations

4.1. Factors and extensions with respect to an abelian group of

4.2. Weak mixing extensions

4.1 Proposition**.**

Proof.

4.3. Primitive extensions

4.2 Lemma**.**

Proof.

4.3 Lemma**.**

4.4 Lemma**.**

Proof.

4.5 Proposition**.**

Proof.

5. Short proofs of Theorems 2.2 and 2.5

6. Nonconventional polynomial arrays

6.1. Proof of Theorem 2.8

6.1 Lemma**.**

6.2 Lemma**.**

Proof.

6.2. Nonconvergence under weak mixing

6.3. Proof of Theorem 2.9

6.3 Lemma**.**

Proof.

Acknowledgement.

2.1 Theorem.

2.2 Theorem.

2.3 Corollary.

2.4 Theorem.

2.5 Theorem.

2.6 Corollary.

2.7 Remark.

2.8 Theorem.

2.9 Theorem.

2.10 Proposition.

3.1 Proposition.

3.2 Corollary.

3.3 Proposition.

4.1 Proposition.

4.2 Lemma.

4.3 Lemma.

4.4 Lemma.

4.5 Proposition.

6.1 Lemma.

6.2 Lemma.

6.3 Lemma.