Nonparametric Inference under B-bits Quantization

Kexuan Li; Ruiqi Liu; Ganggang Xu; Zuofeng Shang

arXiv:1901.08571·math.ST·August 14, 2023

Nonparametric Inference under B-bits Quantization

Kexuan Li, Ruiqi Liu, Ganggang Xu, Zuofeng Shang

PDF

Open Access

TL;DR

This paper introduces a nonparametric testing method for quantized data, demonstrating its asymptotic properties and effectiveness through simulations and real data, especially when the number of bits exceeds a certain threshold.

Contribution

It proposes a computationally efficient nonparametric testing procedure for B-bit quantized samples with theoretical guarantees and extensions to linearity and adaptive tests.

Findings

01

Test statistic achieves classical minimax rate when B exceeds threshold

02

Method is effective for spline models and nonparametric linearity testing

03

Simulation and real-data studies confirm validity and effectiveness

Abstract

Statistical inference based on lossy or incomplete samples is often needed in research areas such as signal/image processing, medical image storage, remote sensing, signal transmission. In this paper, we propose a nonparametric testing procedure based on samples quantized to $B$ bits through a computationally efficient algorithm. Under mild technical conditions, we establish the asymptotic properties of the proposed test statistic and investigate how the testing power changes as $B$ increases. In particular, we show that if $B$ exceeds a certain threshold, the proposed nonparametric testing procedure achieves the classical minimax rate of testing (Shang and Cheng, 2015) for spline models. We further extend our theoretical investigations to a nonparametric linearity test and an adaptive nonparametric test, expanding the applicability of the proposed methods. Extensive simulation studies…

Tables1

Table 1. Table 1: Table that lists some of the useful notations that are frequently used throughout the paper.

Symbol	Description
$c$	number of groups.
$\tilde{n}$	number of observations in each group which is defined as $\tilde{n} = n / c$ .
${(μ_{1}, \dots, μ_{k})}^{T}$	quantized value.
${(t_{1}, \dots, t_{k - 1})}^{T}$	cut-off points of quantized intervals.
$y = {(y_{1}, \dots, y_{n})}^{T}$	vector of response .
$\tilde{y} = {({\tilde{y}}_{1}, \dots, {\tilde{y}}_{c})}^{T}$	average of the response which is defined as ${\tilde{y}}_{i} = \frac{1}{\tilde{n}} \sum_{j = (i - 1) \tilde{n} + 1}^{i \tilde{n}} y_{j}$ .
$z = {(z_{1}, \dots, z_{c})}^{T}$	vector of quantized sample.
$z^{0} = {(z_{1}^{0}, \dots, z_{c}^{0})}^{T}$	vector of quantized sample under $H_{0} : g_{0} = 0$ .
$\tilde{z} = {({\tilde{z}}_{1}, \dots, {\tilde{z}}_{c})}^{T}$	vector of truncated quantized sample,
	where ${\tilde{z}}_{i} = z_{i} 𝟙 (c_{s} ρ + σ \| ϵ_{j} \| \leq \sqrt{𝒯_{n}}) for all j = (i - 1) \tilde{n} + 1, \dots i \tilde{n})$ .
${\tilde{z}}^{0} = {({\tilde{z}}_{1}^{0}, \dots, {\tilde{z}}_{c}^{0})}^{T}$	vector of truncated quantized response under $H_{0} : g_{0} = 0$ ,
	where ${\tilde{z}}_{i}^{0} = z_{i}^{0} 𝟙 (c_{s} ρ + σ \| ϵ_{j} \| \leq \sqrt{𝒯_{n}}) for all j = (i - 1) \tilde{n} + 1, \dots i \tilde{n})$ .
$y^{linear} = {(y_{1}^{linear}, \dots, y_{n}^{linear})}^{T}$	new defined data for testing the linearity of $g_{0}$ , which is defined as
	$y_{i}^{linear} = Q (y_{i}) - \hat{g} (i / n)$ , and $\hat{g} (i / n)$ is the least-square estimator of $g$ .
$z^{linear} = {(z_{1}^{linear}, \dots, z_{c}^{linear})}^{T}$	vector of quantized value of $y_{i}^{linear}$ .
$z_{0}^{linear} = {(z_{i, 0}^{linear}, \dots, z_{i, c}^{linear})}^{T}$	quantized value of $y_{i}^{linear}$ under $H_{0}^{linear} : g_{0} is linear .$
$λ$	smoothing parameter.
${φ_{i} (x)}_{i = 1}^{\infty}$	trigonometric basis functions.
$K (\cdot, \cdot)$	kernel function.
$Σ_{c}$	kernel matrix defined as $Σ_{c} = {[K (i / c, i^{'} / c) / c]}_{1 \leq i, i^{'} \leq c}$ .
$Ω_{c}$	“tensor” of $K (\cdot, \cdot)$ defined as $Ω_{c} = {[K^{\otimes 2} (i / c, i^{'} / c) / c]}_{1 \leq i, i^{'} \leq c}$ .
$A$	$A = {(Σ_{c} + λ I_{c})}^{- 1} Ω_{c} {(Σ_{c} + λ I_{c})}^{- 1}$ .
$ζ$	approximation error of Riemann sum and integral.
$c_{s}$	Sobolev constant defined as $c_{s} = {sup}_{f \in S^{m} (𝕀)} \frac{{‖ f ‖}_{sup}}{\sqrt{J (f)}}$ .
$C_{k} (t)$	maximum length of quantization interval.

Equations587

y_{i} = g_{0} (i / n) + σ ϵ_{i}, i = 1, \dots, n,

y_{i} = g_{0} (i / n) + σ ϵ_{i}, i = 1, \dots, n,

H_{0} : g_{0} (x) = g_{*} (x) for all x \in [0, 1],

H_{0} : g_{0} (x) = g_{*} (x) for all x \in [0, 1],

S^{m} (I) = {ν = 1 \sum \infty β_{ν} φ_{ν} (\cdot) : ν = 1 \sum \infty β_{ν}^{2} γ_{ν} < \infty},

S^{m} (I) = {ν = 1 \sum \infty β_{ν} φ_{ν} (\cdot) : ν = 1 \sum \infty β_{ν}^{2} γ_{ν} < \infty},

K (x, y) = \frac{( - 1 ) ^{m - 1}}{( 2 m )!} B_{2 m} (∣ x - y ∣),

K (x, y) = \frac{( - 1 ) ^{m - 1}}{( 2 m )!} B_{2 m} (∣ x - y ∣),

g^{ss}

g^{ss}

g^{ss} = i = 1 \sum n θ_{i} K_{i / n},

g^{ss} = i = 1 \sum n θ_{i} K_{i / n},

T^{ss} = ∥ g^{ss} - g_{*} ∥^{2} .

T^{ss} = ∥ g^{ss} - g_{*} ∥^{2} .

Q (y) = j = 1 \sum k μ_{j} I [y \in R_{j} (t)], with μ_{j} = (l_{1} + j - 1) C_{k} (t),

Q (y) = j = 1 \sum k μ_{j} I [y \in R_{j} (t)], with μ_{j} = (l_{1} + j - 1) C_{k} (t),

Second-stage quantization: z_{i} - \frac{1}{n} j = (i - 1) n + 1 \sum i n Q (y_{j}) \leq C_{k} (t), for i = 1, \dots, c .

Second-stage quantization: z_{i} - \frac{1}{n} j = (i - 1) n + 1 \sum i n Q (y_{j}) \leq C_{k} (t), for i = 1, \dots, c .

g_{μ, t, c}^{B}

g_{μ, t, c}^{B}

g_{μ, t, c}^{B} = i = 1 \sum c θ_{i} K_{i / c},

g_{μ, t, c}^{B} = i = 1 \sum c θ_{i} K_{i / c},

λ = λ > 0 arg min \frac{c ∥ [ I _{c} - Σ _{c} ( Σ _{c} + λ I _{c} ) ^{- 1} )] z ∥ _{2}^{2}}{[ c - trace ( Σ _{c} ( Σ _{c} + λ I _{c} ) ^{- 1} ) ] ^{2}},

λ = λ > 0 arg min \frac{c ∥ [ I _{c} - Σ _{c} ( Σ _{c} + λ I _{c} ) ^{- 1} )] z ∥ _{2}^{2}}{[ c - trace ( Σ _{c} ( Σ _{c} + λ I _{c} ) ^{- 1} ) ] ^{2}},

T_{μ, t, c} = ∥ g_{μ, t, c}^{B} ∥^{2} .

T_{μ, t, c} = ∥ g_{μ, t, c}^{B} ∥^{2} .

\frac{c T _{μ, t, c} - trace ( A ) τ _{k}^{2}}{s _{c} τ _{k}^{2}} ⟶ d N (0, 1) as n, c \to \infty,

\frac{c T _{μ, t, c} - trace ( A ) τ _{k}^{2}}{s _{c} τ _{k}^{2}} ⟶ d N (0, 1) as n, c \to \infty,

τ_{k}^{2} = \frac{τ _{n}^{2}}{2 n ( n - 1 )},

τ_{k}^{2} = \frac{τ _{n}^{2}}{2 n ( n - 1 )},

ϕ_{c, k} = I (∣ c T_{μ, t, c} - trace (A) τ_{k}^{2} ∣ \geq z_{1 - α /2} s_{c} τ_{k}^{2}),

ϕ_{c, k} = I (∣ c T_{μ, t, c} - trace (A) τ_{k}^{2} ∣ \geq z_{1 - α /2} s_{c} τ_{k}^{2}),

c^{†} = max {c \in \mathbbm Z : c lo g_{2} (n T_{n} / σ^{2}) \leq B}, k^{†} = max {k \in \mathbbm Z : c^{†} ⌈ lo g_{2} (k)⌉ \leq B} .

c^{†} = max {c \in \mathbbm Z : c lo g_{2} (n T_{n} / σ^{2}) \leq B}, k^{†} = max {k \in \mathbbm Z : c^{†} ⌈ lo g_{2} (k)⌉ \leq B} .

c_{s} \equiv g \in S^{m} (I) sup \frac{∥ g ∥ _{s u p}}{J ( g , g )} .

c_{s} \equiv g \in S^{m} (I) sup \frac{∥ g ∥ _{s u p}}{J ( g , g )} .

Condition (B) :

Condition (B) :

μ_{j}^{2} P (∣ σ ϵ_{1} ∣ + c_{s} ρ > T_{n}) = o (\frac{1}{n}) for j = 1, k .

E ∥ g_{μ, t, c}^{B} - g_{0} ∥^{2} ≲ (nh)^{- 1} + λ + c^{- m i n {2 m, 3}} + G_{c, k} (t),

E ∥ g_{μ, t, c}^{B} - g_{0} ∥^{2} ≲ (nh)^{- 1} + λ + c^{- m i n {2 m, 3}} + G_{c, k} (t),

G_{c, k, 1} (t)

G_{c, k, 1} (t)

G_{c, k, 2} (t)

\frac{c T _{μ, t, c} - trace ( A ) τ _{k}^{2}}{s _{c} τ _{k}^{2}} ⟶ d N (0, 1), as n, c \to \infty,

\frac{c T _{μ, t, c} - trace ( A ) τ _{k}^{2}}{s _{c} τ _{k}^{2}} ⟶ d N (0, 1), as n, c \to \infty,

H_{1} : g_{0} \in S_{ρ}^{m} (I) \ {0} .

H_{1} : g_{0} \in S_{ρ}^{m} (I) \ {0} .

f (x) = \frac{1}{2Δ} \int_{m a x (x - Δ, 0)}^{m i n (x + Δ, 1)} g_{0} (s) d s, where x \in I, and Δ = \frac{1}{c} .

f (x) = \frac{1}{2Δ} \int_{m a x (x - Δ, 0)}^{m i n (x + Δ, 1)} g_{0} (s) d s, where x \in I, and Δ = \frac{1}{c} .

g \in S_{ρ}^{m} (I) ∥ g ∥_{c} \geq C_{η} δ_{n, c, λ} in f P (reject H_{0} ∣ H_{1} is true) \geq 1 - η,

g \in S_{ρ}^{m} (I) ∥ g ∥_{c} \geq C_{η} δ_{n, c, λ} in f P (reject H_{0} ∣ H_{1} is true) \geq 1 - η,

λ ≫ c^{- 2 m} in f δ_{n, c, λ} = ⎩ ⎨ ⎧ n^{- \frac{2 m}{4 m + 1}}, (n c)^{- \frac{m}{2 ( m + 1 )}}, λ^{1/2}, if c ≳ n^{\frac{3}{4 m + 1}} with λ ≍ n^{- \frac{4 m}{4 m + 1}}; if n^{\frac{1}{2 m + 1}} ≪ c ≲ n^{\frac{3}{4 m + 1}} with λ ≍ (n c)^{- \frac{m}{m + 1}}; if c ≲ n^{\frac{1}{2 m + 1}} with λ ≫ c^{- 2 m} .

λ ≫ c^{- 2 m} in f δ_{n, c, λ} = ⎩ ⎨ ⎧ n^{- \frac{2 m}{4 m + 1}}, (n c)^{- \frac{m}{2 ( m + 1 )}}, λ^{1/2}, if c ≳ n^{\frac{3}{4 m + 1}} with λ ≍ n^{- \frac{4 m}{4 m + 1}}; if n^{\frac{1}{2 m + 1}} ≪ c ≲ n^{\frac{3}{4 m + 1}} with λ ≍ (n c)^{- \frac{m}{m + 1}}; if c ≲ n^{\frac{1}{2 m + 1}} with λ ≫ c^{- 2 m} .

λ ≫ c^{- 2 m} in f δ_{n, c, λ} = ⎩ ⎨ ⎧ n^{- \frac{2 m}{4 m + 1}}, (n c)^{- \frac{m}{2 ( m + 1 )}}, λ^{1/2}, if c ≳ n^{\frac{3}{4 m + 1}}, b ≫ lo g_{2} (n^{\frac{2 m}{4 m + 1}} T_{n}); if n^{\frac{1}{2 m + 1}} ≪ c ≲ n^{\frac{3}{4 m + 1}}, b ≫ lo g_{2} (n^{\frac{2 m + 3}{4 ( m + 1 )}} c^{- \frac{2 m + 1}{4 ( m + 1 )}} T_{n}); if c ≲ n^{\frac{1}{2 m + 1}}, b ≫ lo g_{2} (n T_{n}) with λ ≫ c^{- 2 m} .

λ ≫ c^{- 2 m} in f δ_{n, c, λ} = ⎩ ⎨ ⎧ n^{- \frac{2 m}{4 m + 1}}, (n c)^{- \frac{m}{2 ( m + 1 )}}, λ^{1/2}, if c ≳ n^{\frac{3}{4 m + 1}}, b ≫ lo g_{2} (n^{\frac{2 m}{4 m + 1}} T_{n}); if n^{\frac{1}{2 m + 1}} ≪ c ≲ n^{\frac{3}{4 m + 1}}, b ≫ lo g_{2} (n^{\frac{2 m + 3}{4 ( m + 1 )}} c^{- \frac{2 m + 1}{4 ( m + 1 )}} T_{n}); if c ≲ n^{\frac{1}{2 m + 1}}, b ≫ lo g_{2} (n T_{n}) with λ ≫ c^{- 2 m} .

H_{0}^{linear} : g_{0} \in L (I) vs. H_{1}^{linear} : g_{0} \in S_{ρ}^{m} (I) \ {L (I)},

H_{0}^{linear} : g_{0} \in L (I) vs. H_{1}^{linear} : g_{0} \in S_{ρ}^{m} (I) \ {L (I)},

\frac{c T _{linear, μ, t, c} - trace ( A ) τ _{k}^{2}}{s _{c} τ _{k}^{2}} ⟶ d N (0, 1), as n, c \to \infty,

\frac{c T _{linear, μ, t, c} - trace ( A ) τ _{k}^{2}}{s _{c} τ _{k}^{2}} ⟶ d N (0, 1), as n, c \to \infty,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Distributed Sensor Networks and Detection Algorithms

Full text

Nonparametric Inference under B-bits Quantization

\nameKexuan Li \[email protected]

\addrGlobal Analytics and Data Sciences

Biogen Inc

Cambridge, MA 02142 USA \AND\nameRuiqi Liu \[email protected]

\addrDepartment of Mathematics and Statistics

Texas Tech University

Lubbock, TX 79409, USA \AND\nameGanggang Xu \[email protected]

\addrDepartment of Management Science

University of Miami

Coral Gables, FL 33146, USA \AND\nameZuofeng Shang \[email protected]

\addrDepartment of Mathematical Sciences

New Jersey Institute of Technology

Newark, NJ 07102, USA

Abstract

Statistical inference based on lossy or incomplete samples is often needed in research areas such as signal/image processing, medical image storage, remote sensing, signal transmission. In this paper, we propose a nonparametric testing procedure based on samples quantized to $B$ bits through a computationally efficient algorithm. Under mild technical conditions, we establish the asymptotic properties of the proposed test statistic and investigate how the testing power changes as $B$ increases. In particular, we show that if $B$ exceeds a certain threshold, the proposed nonparametric testing procedure achieves the classical minimax rate of testing (Shang and Cheng, 2015) for spline models. We further extend our theoretical investigations to a nonparametric linearity test and an adaptive nonparametric test, expanding the applicability of the proposed methods. Extensive simulation studies together with a real-data analysis are used to demonstrate the validity and effectiveness of the proposed tests.

Keywords: B-bits Quantization, Minimax Rates of Testing, Nonparametric Inference, Smoothing Splines

1 Introduction

Lossy or incomplete data are commonly encountered in research areas such as machine learning, information theory, and signal processing. To store and process signals in digital devices, quantization is a popular procedure that maps the original measurements from a large (often uncountably infinite) set to a set of possible values. The resulting values are referred to as the quantized samples. With the increasing availability of data, it is of great interest to quantify how the data analysis can be affected when the data are quantized due to storage or communication budget constraint, and how to design quantization schemes to minimize the efficiency loss. Statistical inference based on quantized samples is challenging because, in addition to the measurement errors, one also needs to account for the information loss due to the quantization errors. In particular, commonly used standard statistical procedures may not be valid when applied to quantized samples if the quantization errors are ignored.

The research on lossy data has attracted increasing attention recently. The first line of works focuses on $b$ -bit compressive sensing, which aims at reconstructing a sparse signal from a sequence of $b$ -bit quantized outcomes. A $1$ -bit compressive sensing model was proposed by Boufounos and Baraniuk (2008), and several efficient and provable algorithms have been developed; see, e.g., Gupta et al. (2010); Gopi et al. (2013); Plan and Vershynin (2013); Zhang et al. (2014); Zhu and Gu (2015). A signal recovery algorithm was proposed in Slawski and Li (2015), which extended the $1$ -bit compressive sensing model to a $b$ -bit compressive sensing model. The second line of research related to the lossy data is to develop statistical methods based on quantized observations. For example, Lee and Vardeman (2001) studied the interval estimation of a normal mean process from rounded data, which was further extended to more general likelihood-based statistical estimation problems (Vardeman and Lee, 2005) and nonparametric regression problems (Benhenni and Rachdi, 2006). Recently, an increasing number of works aim to quantify the impact of quantization on the statistical properties of the resulting estimators. For example, Zhang et al. (2013) established lower bounds on the minimax risks for distributed estimation of parametric models under a communication budget constraint. Suresh et al. (2017) proposed communication efficient algorithms for distributed mean estimation without probabilistic assumptions on the data. A version of Pinsker’s theorem under some storage or communication constraints was developed in Zhu and Lafferty (2014), and it was further applied to analyze the convergence rate of nonparametric estimation with a limited bits budget by Zhu and Lafferty (2017). More recently, a series of works have emerged in investigating the high-dimensional and/or nonparametric regression model estimation in the distributed learning framework with bits constraints, e.g., see Zhu and Lafferty (2018); Han et al. (2018); Szabo and van Zanten (2020); Cai and Wei (2021).

Despite the abundant existing literature on statistical modeling of quantized data, research focusing on the nonparametric inference based on quantized data is still lacking. This paper aims to fill this gap by proposing a new quantization scheme with a $B$ -bits storage or communication budget such that nonparametric estimation and testing based on quantized samples are still valid. Specifically, we consider the following regression model

[TABLE]

where $g_{0}(\cdot)$ is a smooth function, $\epsilon_{i}$ ’s are iid zero-mean errors with an unit variance, and $\sigma>0$ is an unknown constant. The goal is to (a) estimate $g_{0}(\cdot)$ , and (b) test the following hypothesis

[TABLE]

where $g_{*}(\cdot)$ is a pre-specified deterministic function.

The above model has been extensively studied in the literature, see, e.g., Shang and Cheng (2017), and is closely related to the well-known Gaussian sequence model and Gaussian white noise model (Tsybakov, 2008). However, unlike existing literature, we consider the case in which the original data, denoted by $y_{1},\cdots,y_{n}$ , are generated in machine M, and are quantized as soon as they are generated. The quantized data are then stored in a machine M or transmitted to another machine M∗ for future statistical inferences. We assume that only $B$ -bits budget are available for data storage or communication, rendering the necessity for data quantization that may invalidate existing estimation and inference methods. Such a research problem is important for applications where data generation and analysis are carried out at different locations. For example, testing $H_{0}:g_{0}(x)=0$ reveals whether the transmitted quantized signals through satellite are pure noises. If $g_{*}(\cdot)$ is the signal-process from a normally functioning machine, testing (2) using only quantized samples enables us to remotely monitor whether the machine is working properly in real-time.

To meet the $B$ -bits requirement, we propose a two-stage quantization procedure: in the first stage we quantize an individual $y_{i}$ as $Q(y_{i})$ with $Q(\cdot)$ being a quantizer, and in the second stage we overwrite these quantized observations by their local averages. See Figure 1 and Algorithm 1 for details. As a result, we obtain a quantized sample of size $c$ for some $c<n$ to be stored or transmitted. We demonstrate that with a carefully chosen $c$ and a well-designed quantizer, the proposed nonparametric estimation and testing procedures are asymptotically valid and efficient even based only on the quantized data.

Our contributions can be summarized as follows. Firstly, we propose a computationally efficient data quantization algorithm to reduce the size of the raw data to meet the $B$ -bits constraint, and at the same time reduce the computational complexity from $O(n^{3})$ to $O(c^{3})$ . Secondly, we establish sufficient conditions on the bits constraint, i.e., $B$ , that warrants the minimax convergence rate for the resulting spline estimators and the minimax rates of testing for the proposed testing procedure. In particular, our results show how the asymptotic power of the proposed testing procedure changes as the bits constraint $B$ increases. Thirdly, we further extend our theoretical investigations to (a) a nonparametric linearity test of the underlying function; (b) an adaptive nonparametric test when the smoothness of the underlying function is unknown. To the best of our knowledge, our work is the first to provide a theoretical investigation on nonparametric inference based on quantized samples.

The rest of the paper is organized as follows. Section 2 describes the general methodologies we proposed for data quantization, nonparametric estimation, and nonparametric testing using splines. In Section 3, we investigate the theoretical properties of the spline estimator and the nonparametric test statistic based on quantized samples. In Section 4, we study asymptotic properties of the nonparametric linearity test statistic and the adaptive nonparametric test statistic under B-bits constraint. Section 5 gives several simulation studies to evaluate finite sample performances of the proposed methods and Section 6 illustrates an application of the proposed methods to the Combined Cycle Power Plant Data.

Notation: Let $\|\cdot\|$ represent the $L^{2}$ -norm, i.e., $\|f\|^{2}=\int_{0}^{1}f^{2}(t)dt$ , and define $\|\cdot\|_{2}$ as the Euclidean Norm of vectors. Let $\|\cdot\|_{\sup}$ denote the supreme norm of a function, i.e., $\|f\|_{\sup}=\sup_{t\in[0,1]}|f(t)|$ . For two positive sequences $a_{n}$ and $b_{n}$ , we denote $a_{n}\gtrsim b_{n}$ ( $a_{n}\lesssim b_{n}$ ) if there exists a constant $C>0$ such that $a_{n}\geq Cb_{n}$ ( $a_{n}\leq Cb_{n}$ ) for all $n\geq 1$ ; denote $a_{n}\asymp b_{n}$ if $a_{n}\gtrsim b_{n}$ and $a_{n}\lesssim b_{n}$ ; denote $a_{n}\ll b_{n}$ if $a_{n}/b_{n}\to 0$ as $n\to\infty$ and $a_{n}\gg b_{n}$ if $a_{n}/b_{n}\to\infty$ as $n\to\infty$ .

2 Methodology

In this section, we first review some background of the classical smoothing spline regression and then give details on the proposed quantization scheme, nonparametric estimation and testing procedures.

2.1 Review of Classical Smoothing Spline Regression

Throughout this paper, we assume that the underlying true function $g_{0}(\cdot)$ belongs to the $m$ -order ( $m\geq 1$ ) periodic Sobolev space on $\mathbb{I}:=[0,1]$ defined as

[TABLE]

where $\varphi_{2k-1}(x)=\sqrt{2}\cos(2\pi kx),\,\,\,\,\varphi_{2k}(x)=\sqrt{2}\sin(2\pi kx)$ are the trigonometric basis functions, and $\gamma_{2k-1}=\gamma_{2k}=(2\pi k)^{2m}$ for $x\in\mathbb{I}$ and $k\geq 1$ . It follows from Wahba (1990) and Gu (2013) that $S^{m}(\mathbb{I})$ is a reproducing kernel Hilbert space (RKHS) endowed with an inner product $J(f,g)=\int_{0}^{1}f^{(m)}(x)g^{(m)}(x)dx$ and a reproducing kernel

[TABLE]

where $B_{2m}$ is the Bernoulli polynomial of order $2m$ .

Based on the above assumptions on $g_{0}(\cdot)$ , the classic smoothing spline (ss) estimator of $g_{0}(\cdot)$ is obtained through the following optimization problem:

[TABLE]

For $x\in\mathbb{I}$ , we can define a function $K_{x}(\cdot)=K(x,\cdot)$ , which belongs to $S^{m}(\mathbb{I})$ . By the representer Theorem (Gu, 2013), the solution to (3) has the following closed-form

[TABLE]

where $\theta=(\theta_{1},\ldots,\theta_{n})^{T}=n^{-1}(\Sigma_{n}+\lambda I_{n})^{-1}y$ with $\Sigma_{n}=[K(i/n,i^{\prime}/n)/n]_{1\leq i,i^{\prime}\leq n}\in\mathbb{R}^{n\times n}$ , $y=(y_{1},\ldots,y_{n})^{T}\in\mathbb{R}^{n}$ and $I_{n}$ being the $n\times n$ identify matrix.

To conduct hypothesis test for (2), a straightforward idea is to construct a testing statistic based on the distance between $\widehat{g}^{\textrm{ss}}(\cdot)$ and $g_{*}(\cdot)$ . Specifically, we use the $L_{2}$ norm distance defined as

[TABLE]

With an appropriate normalization, it can be shown that $T^{\textrm{ss}}$ is asymptotically normally distributed (Shang and Cheng, 2017; Yang et al., 2020; Liu et al., 2021).

2.2 Two-Stage Quantization

The original observations $y_{i}$ ’s in (1) are real-valued random variables, each of which literally requires an infinite amount of bits to store or transmit. When there are only $B$ available bits, the original observations $y_{i}$ ’s may not be directly accessible for estimation or testing, and hence, the classical smoothing spline estimator given in (4) is not applicable. This section aims to introduce a two-stage quantization scheme to transform $y_{i}$ ’s into the ones whose storage or transmission meets the $B$ -bits constraint. The resulting samples will be further used for optimal inferential purposes in the subsequent sections. The two-stage quantization process is demonstrated in the following Figure 1.

The first-stage quantization is to quantize the data $y_{i}$ ’s as soon as they are generated with at most $k$ distinct values. For convenience, we use a uniform quantization scheme as follows. We first choose an interval $[t_{1},t_{k-1}]$ and choose $t_{2}<\ldots<t_{k-2}$ as the equally spaced grid points within $[t_{1},t_{k-1}]$ . Denote $t=(t_{1},\ldots,t_{k-1})^{T}\in\mathbb{R}^{k-1}$ and the sub-interval length $C_{k}(t):=(t_{k-1}-t_{1})/(k-2)$ . For ease of presentation, we assume that $l_{1}=t_{1}/C_{k}(t)$ is an integer. Define a quantizer $Q(\cdot)$ as follows:

[TABLE]

where $\mu=(\mu_{1},\ldots,\mu_{k})^{T}\in\mathbb{R}^{k}$ consists of the quantized values and $R_{1}(t)=(-\infty,t_{1}],R_{2}(t)=(t_{1},t_{2}],\ldots,R_{k-1}(t)=(t_{k-2},t_{k-1}]$ , $R_{k}(t)=(t_{k-1},\infty)$ are the corresponding quantized intervals. Clearly, the $R_{j}(t)$ ’s form a partition of the real line with assigned marks $\mu_{j}$ ’s and $Q$ maps each $y\in\mathbb{R}$ to one of the $k$ marks. Applying $Q$ to $y_{i}$ ’s, we generate $n$ quantized samples $Q({y}_{1}),\cdots,Q({y}_{n})$ , each of which takes at most $k$ distinct values. Storage or transmission of $Q(y_{i})$ ’s thus requires $n\log_{2}{k}$ bits which might still go beyond the $B$ -bits budget when $B=o(n)$ . For this reason, we propose the following second-stage quantization to further reduce the storage or transmission bits through locally averaging the $Q(y_{i})$ ’s.

The second-stage quantization is to further reduce the number of storage or transmission bits via local average. Specifically, we divide the interval $\mathbb{I}=[0,1]$ into $c$ equally-spaced sub-intervals for some $c\leq B/\lceil\log_{2}(k+2)\rceil$ . For simplicity, we assume that $\widetilde{n}\coloneqq n/c$ is an integer and each sub-interval contains $\widetilde{n}$ observations. The quantized data from the first-stage quantization, i.e., $Q(y_{i})$ ’s, are further quantized as $z_{i}=\widetilde{l}_{i}C_{k}(t)$ such that

[TABLE]

Details of the two-stage quantization algorithm are provided in the following Algorithm 1.

Based on the definition of the quantizer $Q(\cdot)$ in (5), $z_{i}$ in (6) must belong to the interval $[\mu_{1}-C_{k}(t),\mu_{k}+C_{k}(t)]$ and must be of the form $lC_{k}(t)$ for some integer $l$ . Therefore, there are at most $k+2$ distinct values of $z_{i}$ ’s, namely, $lC_{k}(t)$ , for $l=l_{1}-1,l_{1},l_{1}+1,\cdots,(l_{1}+k-1),l_{1}+k$ . As a result, each $z_{i}$ requires $b=\lceil\log_{2}(k+2)\rceil$ bits to store or transmit, hence, the entire $z_{i}$ ’s require $c\lceil\log_{2}(k+2)\rceil\leq B$ bits, where $\lceil x\rceil$ is the smallest integer greater than $x$ . In the subsequent sections, we will show that, with $c,k$ being properly selected, optimal inferences based on $z_{i}$ ’s are possible even under $B=o(n)$ , comparing to other regression literature which typically needs $B\gtrsim n$ (see Slawski and Li (2018)). For optimal inferences in non-regression settings such as Gaussian sequence model or Gaussian white-noise model, similar findings were made by Cai and Wei (2021).

2.3 $B$ -bits Nonparametric Spline Estimation

Given $B$ , let us choose $k,c$ such that $c\lceil\log_{2}(k+2)\rceil=B$ , i.e., our two-stage quantization maximizes the use of the available bits. Based on the quantized samples $z_{1},\ldots,z_{c}$ from Algorithm 1, a $B$ -bits constrained spline estimator is proposed as follows

[TABLE]

Similar to (4), the resulting spline estimator $\widehat{g}^{\textrm{B}}_{\mu,t,c}(\cdot)$ has an explicit expression

[TABLE]

where $(\widehat{\theta}_{1},\ldots,\widehat{\theta}_{c})^{T}=c^{-1}(\Sigma_{c}+\lambda I_{c})^{-1}z$ with $\Sigma_{c}=[K(i/c,i^{\prime}/c)/c]_{1\leq i,i^{\prime}\leq c}\in\mathbb{R}^{c\times c}$ , $z=(z_{1},\ldots,z_{c})^{T}\in\mathbb{R}^{c}$ , and $I_{c}$ being the $c\times c$ identity matrix.

Notice that the optimization of (7) only requires on $c$ quantized observations and the solution only involves computing the inverse of a $c\times c$ matrix $\Sigma_{c}+\lambda I_{c}$ , which is much less computationally intensive compared to the classical smoothing spline estimator (4).

Finally, the selection of the tuning parameter $\lambda$ is crucial, and can be obtained by minimizing the generalized cross validation (GCV) score as follows

[TABLE]

The GCV has been widely used in the literature and enjoys appealing theoretical properties in various settings, see, e.g., Wahba (1990); Xu and Huang (2012); Gu (2013); Xu et al. (2018, 2019).

2.4 $B$ -bits Nonparametric Testing

In this section, we propose a test statistic for the null hypothesis (2) based on the $B$ -bits spline estimator $\widehat{g}_{\mu,t,c}^{\textrm{B}}(\cdot)$ . Without loss of generality, we assume $g_{*}(\cdot)\equiv 0$ in the null hypothesis (2). For a nonzero $g_{*}(\cdot)$ , the observed response variables $y_{i}$ ’s from model (1) can be centered as $y_{i}^{*}=y_{i}-g_{*}(i/n)$ , and the same testing procedure can be applied using $y_{i}^{*}$ ’s instead. To test $H_{0}:g_{0}(x)=0$ , we consider test statistic based on the $L_{2}$ norm distance between $\widehat{g}_{\mu,t,c}^{\textrm{B}}(\cdot)$ and $g_{*}(\cdot)\equiv 0$ as following

[TABLE]

Intuitively, a large value of $T_{\mu,t,c}$ should lead to the rejection of $H_{0}$ . In Theorem 3, we shall show that under $H_{0}$ and mild conditions, it holds that

[TABLE]

where $\tau_{k}^{2}=\textrm{Var}(z_{1}|H_{0})$ , $A=(\Sigma_{c}+\lambda I_{c})^{-1}\Omega_{c}(\Sigma_{c}+\lambda I_{c})^{-1}$ , $\Omega_{c}=[K^{\otimes 2}(i/c,i^{\prime}/c)/c]_{1\leq i,i^{\prime}\leq c}$ with $K^{\otimes 2}(x,x^{\prime}):=\int_{0}^{1}K(x,y)K(y,x^{\prime})dy$ and $s_{c}^{2}=2\sum_{1\leq i\neq i^{\prime}\leq c}a_{i,i^{\prime}}^{2}$ with $a_{i,i^{\prime}}$ being the $(i,i^{\prime})$ th entry of $A$ . In practice, $\tau_{k}^{2}$ needs to be estimated based on the quantized data as well. We proposed the following estimator

[TABLE]

where $\widetilde{\tau}_{n}$ is given in the following Algorithm 2 through quantization. Intuitively, the above estimator is a re-scaled (by a factor of $\widetilde{n}^{-1}$ ) version of the quantized sample error variance $\frac{1}{2(n-1)}\sum_{j=2}^{n}\left\{Q(y_{j})-Q(y_{j-1})\right\}^{2}$ . It is straightforward to shown that $\widehat{\tau}_{k}^{2}=\tau_{k}^{2}[1+o_{p}(1)]$ under mild conditions, see Lemma 11 of Appendix C for details. Consequently, the decision rule for testing (2) at significance level $\alpha$ can be defined as follows

[TABLE]

where $z_{1-\alpha/2}$ is the $(1-\alpha/2)$ -percentile of the standard normal distribution. We reject the null hypothesis (2) if and only if $\phi_{c,k}=1$ .

By the design of the quantizer $Q(\cdot)$ in (5), we can see that there are at most $k+1$ distinct possible values for each $\left\{Q(y_{j+1})-Q(y_{j})\right\}^{2}$ ranging from $0C_{k}(t)^{2}$ to $k^{2}C_{k}(t)^{2}$ , $1\leq j\leq n-1$ , yielding the range for $\widetilde{\tau}^{2}$ as $[0,(n-1)k^{2}C_{k}(t)^{2}]$ . Since $\widetilde{\tau}^{2}$ can only take values as $lC_{k}(t)^{2}$ for some integer $l$ , there are at most $(n-1)k^{2}+1$ distinct values for $\widetilde{\tau}^{2}$ , which would cost $\lceil\log_{2}\left\{(n-1)k^{2}+1\right\}\rceil$ bits to store or transmit. Compare to the bit costs for the two-stage quantization $c\lceil\log_{2}(k)\rceil\approx B$ , the cost to store or transmit $\widetilde{\tau}^{2}$ is negligible when $c\to\infty$ , hence is ignored in the calculation of the total bit costs for ease of presentation.

2.5 Practical Choice of $c$ and $k$ Given $B$

The implementations of Algorithms 1 and 2 require a practical choice of $c$ and $k$ for a given bits budget $B$ . Based on the discussion in Section 2.2, Algorithm 1 requires $B=cb$ with $b=\lceil\log_{2}\left(k\right)\rceil$ . Our theoretical investigations in Section 3.3 require that $b\gg\log_{2}\left(\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}\right)$ for some $h\to 0$ , and $ch\to\infty$ as $c,n\to\infty$ , where ${\mathcal{T}}_{n}$ is defined in Condition (B). Furthermore, equations (16) and (17) in Section 3.3 reveal that the optimal choice of $b$ depends on the smoothness of the periodic Sobolev space (i.e., $m$ ) and the tuning parameter $\lambda$ . While the former is typically unknown in practice, the latter needs to be chosen by some data-driven criterion such as GCV based on the quantized data, which is not available until the quantization process is carried out. To simplify the calculation and make the quantization algorithm more practical, we propose to use $b=\lceil\log_{2}\left(\sqrt{n{\mathcal{T}}_{n}/\sigma^{2}}\right)\rceil$ , which is a valid choice for any $m$ and $h$ , and therefore is easy to use in practice. Specifically, given $B$ , we find $c$ and $k$ as follows

[TABLE]

By the definition in Condition (B), $\sqrt{{\mathcal{T}}_{n}}$ is the quantization range, and $\sigma^{2}$ is used in the choice of $b$ so that ${\mathcal{T}}_{n}/\sigma^{2}$ is invariant if $y_{i}$ ’s are multiplied by a constant. Under Condition (B), the actual choice of ${\mathcal{T}}_{n}/\sigma^{2}$ depends on the distribution of $\epsilon_{i}$ ’s in model (1). If $\epsilon_{i}$ ’s follow a standard Gaussian distribution, it suffices to take ${\mathcal{T}}_{n}/\sigma^{2}=2.5\log(n)$ . Therefore, $\sigma^{2}$ in (11) does not need to be estimated. See more discussion under Condition (B) regarding the choice of ${\mathcal{T}}_{n}/\sigma^{2}$ .

3 Asymptotic Theory

We now proceed to study asymptotic properties of the $B$ -bits spline estimator and the nonparametric test statistic. In this section, we restrict our investigation to the simple case scenario when the order $m$ of the periodic Sobolev space is known and fixed, and the exact form of function $g_{*}(\cdot)$ in the null hypothesis (2) is also known. We shall defer theoretical results on more general cases to Section 4.

3.1 Estimation Convergence Rate

We first quantify the convergence rate of $\|\widehat{g}_{\mu,t,c}^{\textrm{B}}-g_{0}\|^{2}$ . Even though the main focus of this paper is conducting statistical inference based on quantized samples, it is still of interest to study the asymptotic properties of the spline estimator $\widehat{g}_{\mu,t,c}^{\textrm{B}}(\cdot)$ . Define the Sobolev constant

[TABLE]

It is known that $c_{s}$ is positive finite see (Adams and Fournier, 2003).

For all our theoretical investigations, we assume that $\mu_{j}$ ’s and $t_{j}$ ’s satisfy the following boundedness condition

[TABLE]

Condition (B) asserts that the values of $\mu_{1},\mu_{k}$ can not be to large, and that $t_{1},t_{k}$ should be sufficiently large. Recall that in this paper, we adopt the uniform quantization scheme for which Condition (B) is rather mild. Since $J(g_{0},g_{0})\leq\rho^{2}$ , by the definition of $c_{s}$ in (12), we have that $\|g_{0}\|_{\sup}\leq c_{s}\rho$ , and we shall assume that $\rho$ is finite for our theoretical investigation. Condition (B) essentially assumes that ${\mathcal{T}}_{n}$ is sufficiently large so that all observed $y_{i}$ ’s fall within the quantization range with a high probability. When $\epsilon_{i}$ ’s follow a sub-Gaussian distribution, it suffices to take ${\mathcal{T}}_{n}\asymp\log{(n)}$ for Condition (B) to hold. For distributions with heavier tails, the required order for $\min\{t_{1}^{2},t_{k-1}^{2}\}$ will be larger, e.g., ${\mathcal{T}}_{n}\asymp[\log(n)]^{2}$ for sub-Exponential distributions. In particular, when $\epsilon_{i}$ ’s follow a normal distribution, it suffices to use ${\mathcal{T}}_{n}=2.5\sigma^{2}\log(n)$ .

Based on Condition (B), the following theorem establishes an asymptotic upper bound for the estimation error $E\|\widehat{g}_{\mu,t,c}^{\textrm{B}}-g_{0}\|^{2}$ .

Theorem 1

If Condition (B) holds, then it follows that

[TABLE]

*where $h=\lambda^{\frac{1}{2m}}$ , and $G_{c,k}(t)=4C_{k}(t)^{2}+G_{c,k,1}(t)+G_{c,k,2}(t)$ , with *

[TABLE]

with $p(\cdot)$ being the distribution of $\epsilon$ .

The asymptotic error bound for $\widehat{g}_{\mu,t,c}^{\textrm{B}}(\cdot)$ given in Theorem 1 can be roughly categorized into three parts: (1) the estimation error of the smoothing spline estimator based on fully observed original data, i.e., $(nh)^{-1}+\lambda$ (Wahba, 1990); (2) the estimation error attributed to first-stage quantization, i.e., $G_{c,k}(t)$ ; and (3) the estimation bias introduced by second-stage quantization, i.e., $c^{-\min\{2m,3\}}$ . An extreme case is when $t_{1}\to-\infty$ , $t_{k-1}\to\infty$ and $C_{k}(t)\to 0$ , i.e., the first-stage quantizer becomes dense enough, in which case $G_{c,k}$ tends to zero, reducing to the classical nonparametric estimation setting.

Intuitively, if a sufficiently large bits budget $B$ , and consequently sufficiently large values $c$ and $k$ can be used, term $(nh)^{-1}+\lambda$ will dominate the upper bound of $E\|\widehat{g}_{\mu,t,c}^{\textrm{B}}-g_{0}\|^{2}$ . As a result, the convergence rate of $\|\widehat{g}_{\mu,t,c}^{\textrm{B}}-g_{0}\|^{2}$ coincides with that of the classical smoothing spline estimator based on original observations without quantization (Wahba, 1990). A sufficient condition is given in the following corollary.

Corollary 2

Assume that Condition (B) holds, and that (1) $C_{k}(t)^{2}\lesssim n^{-2m/(2m+1)}$ ; (2) as $T\rightarrow\infty$ , $p(z)$ satisfies $\int_{|z|\geq T}z^{2}p(z)dz=O(\exp(-T^{d}))$ where $d\geq\frac{4m}{2m+1}$ ; (3) ${\mathcal{T}}_{n}\asymp\log(n)$ ; and that (4) $c\asymp n^{\frac{\max\{1,2m/3\}}{2m+1}}$ , $\lambda\asymp n^{-\frac{2m}{2m+1}}$ . Then it follows that $E\|\widehat{g}_{\mu,t,c}^{\textrm{B}}-g_{0}\|^{2}=O(n^{-\frac{2m}{2m+1}})$ , which achieves the optimal convergence rate of smoothing splines without quantization.

Recall the definition $C_{k}(t)=(t_{k-1}-t_{1})/(k-2)$ , under conditions of Corollary 2, the minimum order of $k$ to achieve the optimal convergence rate is $n^{m/(2m+1)}\log(n)$ , leading to a required $b=\lceil\log_{2}(k)\rceil\asymp\frac{m}{2m+1}\log_{2}(n)$ . Therefore, the total bits budget $B=cb\asymp n^{\frac{\max\{1,2m/3\}}{2m+1}}\log(n)$ . Recently, Zhu and Lafferty (2018) propose a quantization scheme for the Gaussian sequence model that achieves the same optimal estimation rate with a bits budget $B\asymp n^{\frac{1}{2m+1}}$ . Although their bits budget is lower than our proposed method, Zhu and Lafferty (2018) achieve this goal by essentially only quantizing the first $n^{\frac{1}{2m+1}}$ Fourier coefficients of the function $g_{0}(\cdot)$ and discarding the remaining Fourier coefficients as [math]’s. It is unclear how can this approach be extended to making valid nonparametric inferences for $g_{0}(\cdot)$ , which is the main focus of our work. The proposed quantization scheme in Section 2.2 is in spirit closer to the quantization algorithms proposed in Slawski and Li (2018) and references therein, although these works are mainly focused on the estimation of the parametric linear regression model. In the following subsections, we shall investigate the impacts of the bits budget on the asymptotic properties of the proposed nonparametric testing procedure.

3.2 Asymptotic Distribution of the Test Statistic under $H_{0}$

In this section, we proceed to derive the asymptotic distribution of the test statistic $T_{\mu,t,c}$ under $H_{0}$ . From now on, we will use $h=\lambda^{1/(2m)}$ without repeating its definition.

Theorem 3

Suppose that Condition (B) holds, and it holds that $h\to 0$ , $ch\to\infty$ , $b\gg\log_{2}\left(\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}\right)$ and $E([\widetilde{n}^{-1}\sum_{j=1}^{\widetilde{n}}Q(\epsilon_{j})]^{4})=O(c^{2}n^{-2})$ as $n,c\to\infty$ . Then under $H_{0}$ , it follows that

[TABLE]

where $T_{\mu,t,c}$ , $A$ , $\widehat{\tau}_{k}^{2}$ and $s_{c}$ are as defined in Section 2.4.

Theorem 3 states that under some regularity conditions, the null distribution of the nonparametric test statistic $T_{\mu,t,c}$ for $H_{0}$ in (2) is asymptotically normal. The proof relies on Stein’s exchangeable pair method and is given in the Appendix.

We remark that the conditions in Theorem 3 are rather mild. Specifically, the first condition $h\to 0$ requires the tuning parameter $\lambda$ to shrink to zero and the second condition $ch\to\infty$ implies the number of quantized data, ie., $c$ , should be sufficiently large. The only condition that needs more discussion is the last condition $E([\widetilde{n}^{-1}\sum_{j=1}^{\widetilde{n}}Q(\epsilon_{j})]^{4})=O(c^{2}n^{-2})$ , which involves jointly controlling the moment of $\epsilon_{i}$ ’s and the first-stage quantizer $Q(\cdot)$ . Proposition 4 below provides a sufficient condition to for this assumption.

Proposition 4

Suppose that Condition (B) holds. If $E(\epsilon_{1}^{4})=O(nc^{-1})$ , $C_{k}(t)=o(1)$ and $\mu_{j}^{4}P(\sigma\epsilon_{1}\in R_{j}(t))=O(nc^{-1})$ for $j=1$ and $k$ , then it follows that $E([\widetilde{n}^{-1}\sum_{j=1}^{\widetilde{n}}Q(\epsilon_{j})]^{4})=O(c^{2}n^{-2})$ .

Using Theorem 3 and Proposition 4, the validity of the proposed nonparametric testing procedure requires the quantized sample size $c$ to be sufficiently large, in particular, $ch=c\lambda^{1/(2m)}\to\infty$ . Recall that the proposed quantization scheme in Section 2.2 requires a total bits budget $B=cb$ with $b=\lceil\log_{2}(k)\rceil$ . As a result, for Theorem 3 to hold, the required bits budget $B\gg c\log_{2}\left(\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}\right)$ , for which the lower bound is determined by the tuning parameter $\lambda$ (or $h$ ). In the next subsection, we shall investigate the impacts of $\lambda$ on the asymptotic testing power against local alternatives, which can be used to study optimal asymptotic power achievable with a given bits budget $B$ . For example, we shall show that to achieve the minimax rate of testing, one needs $B\gtrsim n^{\frac{3}{4m+1}}\log_{2}\left(n^{\frac{2m}{4m+1}}\sqrt{{\mathcal{T}}_{n}}\right)$ .

3.3 Asymptotic Power of the Nonparametric Test

We now proceed to examine the asymptotic power of the proposed nonparametric test. For a fixed constant $\rho>0$ , let $S_{\rho}^{m}(\mathbb{I})=\{f\in S^{m}(\mathbb{I}):J(f,f)\leq\rho^{2}\}$ be the $\rho$ -ball in the periodic Sobolev space with a radius $\rho$ . We consider the following alternative hypothesis

[TABLE]

Based on the definition of the quantized data $z_{i}$ in (7), its unquantized counterpart can be defined as $\widetilde{y}_{i}=\frac{1}{\widetilde{n}}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}y_{j}$ for $i=1,\cdots,c$ . Under $H_{1}$ , one has that $E\widetilde{y}_{i}=\frac{c}{n}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}g_{0}(j/n)$ for $i=1,\cdots,c$ . To facilitate our theoretical investigation, we introduce the following function

[TABLE]

It is straightforward to show that $\max_{1\leq i\leq c}|f(i/c)-E\widetilde{y}_{i}|=O(n^{-1})$ and that as $\Delta\to 0$ , $\sup_{x\in\mathbb{I}}|g_{0}(x)-f(x)|\to 0$ . Theorem 5 below states that, under some regularity conditions, our proposed nonparametric test can achieve arbitrary high power provided that $H_{0}$ and $H_{1}$ are sufficiently separated.

Theorem 5

Suppose that Condition (B) holds. If it holds that $h\to 0$ , $ch\to\infty$ , $b\gg\log_{2}\left(\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}\right)$ , and $E([\widetilde{n}^{-1}\sum_{j=1}^{\widetilde{n}}Q(\epsilon_{j})]^{4})=O(c^{2}n^{-2})$ , then for any $\eta>0$ , there exists positive constants $C_{\eta}$ and $N_{\eta}$ such that for any $c\geq N_{\eta}$ ,

[TABLE]

where $\delta_{n,c,\lambda}=\sqrt{(nh^{1/2})^{-1}+\lambda+(nch^{2})^{-1}}$ and $\|g\|_{c}=\sqrt{\sum_{i=1}^{c}f^{2}(i/c)/c}$ with function $f(\cdot)$ as defined in (15).

The separation rate $\delta_{n,c,\lambda}$ represents the smallest rate of deviation from the $H_{0}$ that can be consistently detected by the proposed test statistic (9), given sufficiently large $n$ and $c$ . The first part of $\delta_{n,c,\lambda}$ , namely, $(nh^{1/2})^{-1}+\lambda$ , coincides with the separation rate of the classical spline-based nonparametric test using original observations without quantization, see, e.g., Shang and Cheng (2013); Cheng and Shang (2015); Shang and Cheng (2015, 2017). The remaining part of $\delta_{n,c,\lambda}$ , namely, $(nch^{2})^{-1}$ , is an additional term due to the two-stage quantization errors. For a given $n$ and $c$ , the separation rate $\delta_{n,c,\lambda}$ can be minimized by choosing an appropriate value of the tuning parameter $\lambda$ , subject to the constraint $c\lambda^{1/2m}\to\infty$ . Specifically, by some straightforward algebra, one can show that

[TABLE]

Recall that the total bits needed for the proposed quantization scheme in Section 2.2 is $B=cb$ , for which Theorem 5 requires that $h\to 0$ and $b\gg\log_{2}\left(\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}\right)$ . By plugging the optimal smoothing parameter back to the lower bound of $b$ , we have the following

[TABLE]

From (17), we can see that when $B$ is sufficiently large, i.e., $B\gtrsim n^{\frac{3}{4m+1}}\log_{2}\left(n^{\frac{2m}{4m+1}}\sqrt{{\mathcal{T}}_{n}}\right)$ , the minimal separation rate $n^{-\frac{2m}{4m+1}}$ achieves the minimax rate of testing (Shang and Cheng, 2013, 2017; Liu et al., 2020), implying lossless asymptotic testing power using only quantized samples. In this case, the minimal number of bits for each data point, i.e., $b$ , does not depend on $c$ but is determined by the smoothness of the function and the tail bound ${\mathcal{T}}_{n}$ of the error distribution. When $B$ is between $n^{\frac{1}{2m+1}}\log_{2}\left(\sqrt{n{\mathcal{T}}_{n}}\right)$ and $n^{\frac{3}{4m+1}}\log_{2}\left(n^{\frac{2m}{4m+1}}\sqrt{{\mathcal{T}}_{n}}\right)$ , the minimax rate of testing is no longer achievable, but the minimal separation rate still decays polynomially as the original sample size $n$ increases. Furthermore, in this intermediate phase of $B$ , the lower bound of $b$ decreases as $c$ increases, implying that increasing $c$ rather than $b$ when allocating the total bits budget $B$ will more effectively improve the testing power. Finally, when $B$ is less than $n^{\frac{1}{2m+1}}\log_{2}\left(\sqrt{n{\mathcal{T}}_{n}}\right)$ , the asymptotic lower bound for the minimal rate of separation is (roughly) of the order $c^{-m}$ with $c=B/b$ , the number of quantized measurements that can be transmitted or stored, provided that $b\gg\log_{2}\left(\sqrt{n{\mathcal{T}}_{n}}\right)$ .

4 Extensions

Our prior investigations in Section 3 assume that the hypothesized function $g_{*}(\cdot)$ in (2) and the order $m$ of the periodic Sobolev space $S^{m}(\mathbb{I})$ are both known. In reality, it might be interesting to test other hypotheses, e.g., whether $g_{0}$ has a parametric expression such as a linear function. Meanwhile, the order $m$ is often unknown. We will extend the prior works to such settings.

4.1 Nonparametric Testing for Linearity of $g_{0}(\cdot)$

In some applications, we are interested in testing whether $g_{0}(\cdot)$ resides in a parametric family. In this section, as an illustrative example, we consider testing the linearity of $g_{0}(\cdot)$ :

[TABLE]

where $\mathcal{L}(\mathbb{I})$ denotes the class of liner functions over $\mathbb{I}:=[0,1]$ . Testing the hypothesis that $g_{0}(\cdot)$ belongs to other parametric families governed by a finite number of parameters can be conducted in the same way with minor modifications.

To test (18), we first obtain the least-square estimator $\widehat{g}(x)$ , $x\in\mathbb{I}$ based on $Q(y_{j})$ ’s, i.e., $\widehat{g}(x)=\operatorname*{arg\,min}_{g\in\mathcal{L}(\mathbb{I})}\sum_{i=1}^{n}\left[g(x_{i})-Q(y_{i})\right]^{2}$ . Subsequently, we define the new data as $y^{\textrm{linear}}=(Q(y_{1})-\widehat{y}_{1},\ldots,Q(y_{n})-\widehat{y}_{n})^{T}$ , where $\widehat{y}_{i}=\widehat{g}(i/n)$ . By applying the two-stage quantization Algorithm 1 to $y^{\textrm{linear}}$ , we can then obtain the quantized data $z_{\textrm{linear}}=(z_{\textrm{linear},1},\ldots,z_{\textrm{linear},c})^{T}$ . Following the same estimation procedure in Section 2.3, we can obtain a spline estimator $\widehat{g}^{\textrm{B}}_{\textrm{linear},\mu,t,c}$ based on the quantized data $z_{\textrm{linear}}$ .

The resulting test statistic is then defined as $T_{\textrm{linear},\mu,t,c}=\|\widehat{g}_{\textrm{linear},\mu,t,c}^{\textrm{B}}\|^{2}$ , whose limiting distribution under $H^{\textrm{linear}}_{0}$ is given by the following theorem.

Theorem 6

Suppose that Condition (B) holds. If as $n,c\to\infty$ , it holds that $h\to 0$ , $ch\to\infty$ , $b\gg\log_{2}\left(\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}\right)$ , and $E([\widetilde{n}^{-1}\sum_{j=1}^{\widetilde{n}}Q(\epsilon_{j})]^{4})=O(c^{2}n^{-2})$ , then under $H^{\textrm{linear}}_{0}$ , one has that

[TABLE]

where $T_{\mu,t,c}$ , $A$ , $\widehat{\tau}_{k}^{2}$ and $s_{c}$ are as defined in Section 2.4 but based on $y^{\textrm{linear}}$ .

Theorem 6 is an immediate extension of Theorem 3 to testing the linearity of $g_{0}(\cdot)$ using only quantized samples, indicating that the proposed nonparametric linearity test is valid under mild conditions. To investigate the power of the proposed linearity test against the alternative $H_{1}^{\textrm{linear}}$ , we define the distance between $g_{0}(\cdot)$ and the linear function space $\mathcal{L}(\mathbb{I})$ as $\|g_{0}-\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g_{0})\|$ , where $\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g_{0})=\operatorname*{arg\,min}_{f\in\mathcal{L}(\mathbb{I})}\|g_{0}-f\|^{2}$ is the projection of $g_{0}(\cdot)$ to $\mathcal{L}(\mathbb{I})$ . The magnitude of $\|g_{0}-\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g_{0})\|$ characterizes how far the true function $g_{0}(\cdot)$ deviates from any linear function in $\mathcal{L}(\mathbb{I})$ . Note that under null hypothesis $H_{0}^{\textrm{linear}}$ , one has that $\|g_{0}-\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g_{0})\|=0$ .

The following theorem describes the asymptotic power of the proposed nonparametric linearity test.

Theorem 7

Suppose that Condition (B) hold. If as $n,c\to\infty$ , it holds that $h\to 0$ , $ch\to\infty$ , $b\gg\log_{2}\left(\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}\right)$ , and $E([\widetilde{n}^{-1}\sum_{j=1}^{\widetilde{n}}Q(\epsilon_{j})]^{4})=O(c^{2}n^{-2})$ , then for any $\eta>0$ , there exists positive constants $C_{\eta}$ and $N_{\eta}$ such that for any $c\geq N_{\eta}$ ,

[TABLE]

where $\delta_{n,c,\lambda}^{\textrm{linear}}=\sqrt{(nh^{1/2})^{-1}+\lambda+(nch^{2})^{-1}}$ and $\|g\|_{c}=\sqrt{\sum_{i=1}^{c}f^{2}(i/c)/c}$ with function $f(\cdot)$ as defined in (15).

Based on Theorem 7, we can see that for a given quantized sample of size $c$ , the same separation rate for testing can be achieved by the proposed nonparametric linearity test as described in (16). Furthermore, the proofs of Theorems 6-7 are similar to those of Theorem 3 and Theorem 5 by recognizing the fact that the least square estimator $\widehat{g}(\cdot)$ satisfies that $\sup_{x\in\mathbb{I}}|\widehat{g}(x)-g_{0}(x)|=O_{p}(n^{-1/2})$ , whose impact is negligible for a nonparametric spline estimator. It is therefore trivial to extend Theorems 6-7 to testing whether $g_{0}(\cdot)$ resides in other parametric families as long as an uniformly root-n consistent parametric estimator $\widehat{g}(\cdot)$ is available.

4.2 Adaptive Nonparametric Test When $m$ is Unknown

From (16), we can see that the power of the proposed nonparametric test depends crucially on the order $m$ of the periodic Sobolev space where the underlying true function $g_{0}(\cdot)$ resides. However, the order $m$ may be unknown in practice. One popular strategy is to set $m=2$ regardless of the underlying truth, which may lead to sub-optimal testing power. In this section, we construct an optimal adaptive nonparametric testing procedure based on quantized samples that doesn’t require $m$ .

Let $m_{*}$ denote the unknown true order of the Sobolev space to which $g_{0}(\cdot)$ belongs, and assume that $m_{*}$ is an integer between two known integers $m_{l}$ and $m_{u}$ . For instance, one can set $m_{l}=1$ and $m_{u}=\textrm{poly}(\log{n})$ so that, as $n$ diverges, $m_{*}$ is guaranteed to belong to $[m_{l},m_{u}]$ . For any given integer $m$ , we can calculate the test statistics $T_{m}:=T_{\mu,t,c}$ defined in (9) with the tuning parameter $\lambda_{m}=a_{n}^{2m}n^{-4m/(4m+1)}\log(m_{u})^{2m/(4m+1)}$ where $a_{n}$ may depend on $n$ but is free of $m$ . We remark that the upper bound $m_{u}$ may be slowly diverging as $n\to\infty$ . Our adaptive nonparametric testing procedure is summarized as follows.

Step 1. For any $m_{l}\leq m\leq m_{u}\rightarrow\infty$ , calculate the standardized testing statistic

[TABLE]

where $T_{m}$ , $A_{m}$ , $\widehat{\tau}_{k}^{2}$ and $s_{c,m}$ are as defined in Section 2.4.

Step 2. Calculate the maximum of $\xi_{m}$ ’s, i.e., $\xi_{\textrm{max}}=\max_{m_{l}\leq m\leq m_{u}}\xi_{m}$ .

Step 3. Standardize $\xi_{\textrm{max}}$ as following

[TABLE]

where $C_{n}$ satisfies $2\pi C_{n}^{2}\exp(C_{n}^{2})=m_{u}^{2}$ .

For the validity of the proposed adaptive nonparametric test, we assume that the following Condition (C) holds.

[TABLE]

Condition (C) requires that the searching range for $m$ can not be too large by imposing a slowly diverging uppper bound on $m_{u}$ . In addition, the total number of quantized samples, i.e., $c$ , that need to be transmitted or stored can not be too small compared to $n$ , and is jointly determined by $m_{l},m_{u}$ and the tuning parameter $a_{n}$ . These conditions are rather mild and have been used in the literature, see, e.g., Liu et al. (2019, 2021). The following theorem describes the asymptotic behavior of $\xi_{*}$ under $H_{0}$ .

Theorem 8

Suppose that both Conditions (B) and (C) hold, $E([\widetilde{n}^{-1}\sum_{j=1}^{\widetilde{n}}Q(\epsilon_{j})]^{4})=O(c^{2}n^{-2})$ , and that

[TABLE]

Then, under $H_{0}$ given in (2), for any $\alpha\in(0,1)$ , it holds that

[TABLE]

where $q_{\alpha}=-\log(-\log(1-\alpha))$ .

The intuition behind Theorem 8 is straightforward: under $H_{0}$ , the limiting distribution of each $\xi_{m}$ is normal, which suggests that the asymptotic distribution of the maxima $\xi_{*}$ should be close to the extreme value distribution. We use the techniques developed in Koike (2019) to formalize the proof.

Next, we investigate the asymptotic power of the proposed adaptive nonparametric test under the alternative $H_{1}:g_{0}\in S_{\rho}^{m^{*}}(\mathbb{I})\backslash\{\mathcal{L}(\mathbb{I})\}.$

Theorem 9

Suppose that both Conditions (B) and (C) hold, $E([\widetilde{n}^{-1}\sum_{j=1}^{\widetilde{n}}Q(\epsilon_{j})]^{4})=O(c^{2}n^{-2})$ , and that

[TABLE]

Then, for any $\eta>0$ , there exists positive constants $C_{\eta}$ and $N_{\eta}$ such that for any $c\geq N_{\eta}$ ,

[TABLE]

where $\delta_{n,c,a_{n}}=n^{-\frac{2m_{*}}{4m_{*}+1}}[\log(m_{u})]^{\frac{m_{*}}{4m_{*}+1}}\sqrt{a_{n}^{-\frac{1}{2}}+c^{-1}a_{n}^{-2}n^{\frac{3}{4m_{*}+1}}[\log(m_{u})]^{-\frac{2(m_{*}+1)}{4m_{*}+1}}+a_{n}^{2m_{*}}}$ and $\|g\|_{c}=\sqrt{\sum_{i=1}^{c}f^{2}(i/c)/c}$ with function $f(\cdot)$ as defined in (15).

Based on the form of separation rate $\delta_{n,c,a_{n}}$ obtained in Theorem 9, it is straightforward to show that the minimal separation rate is obtained when $a_{n}=a_{0}$ for some constant $a_{0}>0$ , provided that

[TABLE]

so that Condition (C) is met and the second term inside the square-root part of $\delta_{n,c,a_{n}}$ is negligible. Specifically, if $c\gg\max\{n^{2/(4m_{l}+1)}\log(n),n^{3/(4m_{*}+1)}\}[\log(m_{u})]^{-1/(4m_{*}+1)}$ , one has that

[TABLE]

The minimal separation rate (21) is the same as the one obtained in Liu et al. (2019, 2021) and is minimax for the adaptive nonparametric test. This suggests that with the quantized samples, the proposed adaptive test can still achieve the optimal testing power if the bits budget satisfies

[TABLE]

and we take $b=\log_{2}\left(\sqrt{n{\mathcal{T}}_{n}/\sigma^{2}}\right)\asymp\log(n)$ as suggested in Section 2.5. Compared to the minimax rate of testing when $m_{*}$ is known, which is given in (16), the minimal separation rate (21) is only inflated by a factor of $[\log(m_{u})]^{4m_{*}/(4m_{*}+1)}$ . This is the price to pay for searching $m$ over $m_{l}\leq m\leq m_{u}$ . Furthermore, we wish to remark that the lower bound of the bits budget $B$ depends not only on the true order $m_{*}$ but also on the smallest guess of the order, i.e., $m_{l}$ . This can be interpreted by the fact that $\xi_{m_{l}}$ in Step 1 of the adaptive test is constructed based on an under-smoothed spline estimator, which may have a larger order of estimation bias. In practice, it is convenient to set $m_{l}=1$ as suggested by Liu et al. (2021). However, a more accurate guess of $m_{l}$ may lead to a smaller bits budget $B$ required to achieve the minimax rate of testing.

5 Simulation Studies

In this section, we evaluate the finite sample performance of the proposed methods through a set of simulation studies. For all simulation settings except for Section 5.4, the data are generated from the following model

[TABLE]

where $\beta_{3,2}(\cdot)$ is the density function of the beta distribution with parameters $3$ and $2$ , $\epsilon_{i}$ ’s are independent random errors. Two types of errors were considered: (1) $\epsilon\sim N(0,1)$ ; (2) $\epsilon\sim N(0,1.5^{2})$ . We consider $r$ from [math] to $1$ , and various sample sizes $n$ . In particular, $r=0$ is used to examine the empirical size of the proposed test under $H_{0}$ , and other values of $r$ are used to check the empirical powers against alternatives. The target significance level was chosen as $\alpha=0.1$ .

For all simulation studies, we consider the uniform quantization scheme outlined in Section 2.2. Specifically, for the data quantization step, for a given bits budget $B$ , we choose $c,k$ following the approach suggested in Section 2.5 with a ${\mathcal{T}}_{n}/\sigma^{2}=2.5\log(n)$ . For each simulation, the quantization ranges $t_{1},t_{k-1}$ are defined as $t_{1}=\mu_{0}-\sqrt{2.5\sigma^{2}\log(n)},t_{k-1}=\mu_{0}+\sqrt{2.5\sigma^{2}\log(n)}$ , where $\mu_{0}=\int_{0}^{1}g_{0}(x)dx$ with $g_{0}(\cdot)$ being the regression function in model (1). The use of $\mu_{0}$ and $\sigma$ is of limited importance and can be replaced with any reasonable alternatives such as setting $\mu_{0}=0$ or using estimates based on historical data. Summary statistics from each simulation setting were based on $1000$ independent simulation runs. Except for Section 5.3, we considered periodic Sobolev space of order $m=2$ with kernel function $K(x,y)=-B_{4}(|x-y|)/24$ , where $B_{4}$ is the Bernoulli polynomial of order $4$ . The tuning parameter $\lambda$ was set as $\lambda=\widehat{\lambda}_{\textrm{GCV}}/\log(c)$ with $\widehat{\lambda}_{\textrm{GCV}}$ being picked by GCV.

5.1 Estimation Performance of $\widehat{g}^{\textrm{B}}_{\mu,t,c}(\cdot)$

In this section, we first evaluate the estimation performance of the spline estimator $\widehat{g}^{\textrm{B}}_{\mu,t,c}(\cdot)$ defined in (7) that is based on only quantized samples. We generated data from model (22) with $r=0.5,1$ and sample sizes $n=1000,2000,3000,5000,10000$ . For each $n$ , we gradually increase the bits budget $B$ from $30$ to $1000$ . The estimation accuracy was evaluated by the mean squared errors $({\rm RMSE})$ defined as $\|\widehat{g}_{\mu,t,c}^{\textrm{B}}-g_{0}\|$ . The simulation results were summarized in Figure 2, which suggests that the MSEs decrease as $n$ increases in all considered settings. Moreover, as $B$ increases, the MSEs first decreases rapidly at the beginning and then stabilize at some levels. This observation is consistent with our theoretical results established in Section 3.1, which state that increasing $B$ (or equivalently, $c$ and $b$ ) will diminish the impact of information loss due to the data averaging and data quantization, and as a result $\widehat{g}^{\textrm{B}}_{\mu,t,c}(\cdot)$ becomes more accurate. Furthermore, we can also observe after $B$ exceeds a certain threshold, the MSEs of $\widehat{g}^{\textrm{B}}_{\mu,t,c}(\cdot)$ stabilize, which supports the findings in Corollary 2. Specifically, when $B$ is sufficiently large, the MSEs of $\widehat{g}^{\textrm{B}}_{\mu,t,c}(\cdot)$ reaches the estimation error lower bound of the classical spline estimator based on the complete data.

5.2 Nonparametric Test with $g_{*}(\cdot)\equiv 0$ and $m=2$

In this section, we investigate the empirical sizes and powers of the nonparametric test proposed in Section 2.4, when $g_{*}(\cdot)\equiv 0$ in the null hypothesis (2) and $m=2$ treated as known. The data was generated from the model (22) with various $r$ and sample sizes $n$ .

Figure 3 reports the empirical sizes of the proposed nonparametric test when $r=0$ and the empirical powers when $r>0$ , respectively. Specifically, in all case scenarios, the empirical sizes of the proposed test are close to the target nominal level $0.1$ as the sample size $n$ increases. When either $r$ or $n$ increases, we observe that the empirical powers of the proposed test gradually approach one, which suggests that the proposed testing procedure is consistent for the alternative hypothesis that has a sufficiently large deviation (relative to the sample size $n$ ) from the $H_{0}$ . Furthermore, after the bits budget $B$ exceeds a certain threshold, the empirical powers of the proposed nonparametric test are rather close to each other, which supports our theoretical findings in Section 3.3.

5.3 Adaptive Nonparametric Test with an Unknown $m$

In this section, we investigate the validity and the empirical power of the adaptive nonparametric test proposed in Section 4.2, for which the order parameter $m$ is searched from $m_{l}=1$ to $m_{u}=\sqrt{\log n}$ . Figure 4 shows the empirical rejection rates of the proposed nonparametric adaptive test at the $0.1$ significance level. We can observe that when $r=0$ , the empirical rejection rates are rather close to the nominal level $0.1$ . For any given $r>0$ , we can see that the empirical rejection rates increase as the sample size $n$ increases. For a fixed $n$ , as $r$ increases, the empirical rejection rates increase steadily and eventually reach the $100\%$ when $n=600,n=1000$ and $n=10000$ . Finally, as long as the bits budget $B$ exceeds a certain threshold, the empirical rejection rates are rather similar in most settings. All these observations are consistent with our theoretical findings in Theorem 9. Furthermore, the empirical rejection rates are smaller than the nonparametric test (non-adaptive) in Section 5.2 under the same setting, which is the price paid for adaptivity in $m$ .

5.4 Nonparametic Linearity Test with $m=2$

In this section, we study the empirical performance of the proposed nonparametric linearity test. The data is generated from the following model

[TABLE]

where $\beta_{3,2}(\cdot)$ is the density function of the beta distribution with parameters $3$ and $2$ , and $\epsilon_{i}$ ’s are independent random errors. Two types of errors were considered: (1) $\epsilon\sim N(0,1)$ ; (2) $\epsilon\sim N(0,1.5^{2})$ . When $r=0$ , the model satisfies the null hypothesis $H_{0}^{\textrm{linear}}$ , and as $r$ increases, the departure from the linear model becomes increasingly larger.

Figure 5 reports the empirical rejection rates of the nonparametric linearity test proposed in Section (4.1) at the significance level $0.1$ . It is straightforward to see that, when $r=0$ , the empirical rejection rates are close to the nominal size, indicating the validity of the test asserted by Theorem 6. For any given $r>0$ , we can see that the empirical rejection rates increase as the sample size $n$ increases. For a fixed $n$ , as $r$ increases, the empirical rejection rates increase steadily and eventually reach the $100\%$ . Finally, as long as the bits budget $B$ exceeds a certain threshold, the empirical rejection rates are rather similar in most settings. All these observations are consistent with our theoretical findings in Theorem 7.

6 Real Data Analysis

In this section, we apply the proposed methods to the Combined Cycle Power Plant Data (Kaya et al., 2012; Tüfekci, 2014), which can be downloaded at http://https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant. The data set consists of $n=9568$ observations from a Combined Cycle Power Plant over 6 years (2006-2011). The purpose of our analysis is to explore the relationship between the net hourly electrical energy output of the plant between three environmental factors: temperature, ambient pressure, and relative humidity.

Figure 6 displays the estimated curve based on $B$ -bits quantizations ( $B=35,70,140,175$ ) and full data, for which the periodic spline of order $m=2$ was used. For the quantization step, we choose ${\mathcal{T}}_{n}=2.5\times\hat{\sigma}^{2}\log(n)$ , where $\hat{\sigma}^{2}$ is the standard deviation of the observated data, and $c,k$ are determined by Section 2.5. We can observe that the spline estimator based on quantized data with $B=35$ , i.e., the green curve, is rather different from the other curves in the two analyzes. When the bits budget $B$ increases to more than $70$ , such differences quickly diminish. This observation demonstrates the effectiveness of the proposed $B$ -bits quantization scheme.

Next, we conduct some hypothesis tests for the relationship between the net hourly electrical energy output and other three environmental factors. The first test is to test whether there is an association between the energy output and three environmental factors. We consider both non-adaptive and adaptive nonparametric tests. For the non-adaptive nonparametric test, $m=2$ is used. The p-values are all close to zero, implying strong rejections of the null hypothesis. This is not surprising based on the shapes of the spline estimators illustrated in Figure 6.

Next, there appears to be a strong linear association between relative humidity and the energy output in Figure 6. Based on this conjecture, we proceed to test whether the associations between these three environmental factors and the energy output are linear or nonlinear, using the nonparametric linearity test proposed in Section 4.1. The p-values for the first two environmental factors, i.e., ambient pressure and temperature, are both close to zero, indicating strong rejections of the null hypothesis. Figure 7 illustrates the p-values of the nonparametric linearity test for the relationship between relative humidity and energy output as a function of the bits budget $B$ . We can see that the nonparametric linearity test based on quantized data fails to reject the null hypothesis, which echos our conjecture based on Figure 6.

7 Discussion

In this paper, we propose a set of non-parametric testing procedures based on quantized observations, including the non-adaptive nonparametric test, the nonparametric linearity test, and the adaptive nonparametric test. The proposed tests are easy-to-use based on $L_{2}$ -metric between the quantization spline estimators and the hypothesized function. We investigate the asymptotic validity and testing powers of the proposed tests and show how the asymptotic testing powers changes as the bits budget $B$ increases.

In the end, we discuss two additional extensions. First, the present paper only deals with periodic splines. It is interesting to extend our results to more general splines or even kernel ridge regression. The special periodic spline largely reduces the difficulty level of the technical proofs. Indeed, the majority of the proofs can be accomplished by exact calculations based on trigonometric series. For general RKHS, exact calculations may not be possible, and more involved proofs are needed. Second, the nonparametric linearity test can be easily extended to testing general composite null hypotheses such as $H_{0}^{\textrm{general}}:g_{0}(x)=h_{0}(x,\theta)$ for some function $h_{0}$ governed by parameters $\theta\in\mathbb{R}^{p}$ with a fixed $p$ . However, when $p$ is diverging as $n$ increases, it will be more challenging to investigate the asymptotic behavior of the proposed test statistic and will be an interesting future research topic.

A Structure of the proofs

In this section, we outline the high-level structure of the proofs for the main theorems.

•

The proof of Theorem 1 is mainly based on Lemma 10.

–

In Lemma 10, we provide an upper bound for the difference between two smoothing spline estimators.

•

The proof of Theorem 3 relies on Stein’s exchangeable pair method. Specifically, we first prove that the asymptotic normality of $\frac{cT_{\mu^{\star},t,c}-\textrm{trace}(A)\tau^{\star 2}_{k}}{s_{c}\tau^{\star 2}_{k}}$ based on $z^{\star}_{i}$ ’s, where $z^{\star}_{i}$ ’s are the quantized samples corresponding to $\mu_{j}=\mu^{\star}_{j}$ for $1\leq j\leq k$ , $\tau^{\star 2}_{k}=Var(z^{\star}_{i}|H_{0})$ , and

[TABLE]

Next, we prove that

[TABLE]

–

In Lemma 11 and Lemma 12, we prove the error rate introduced by quantization of variance using Algorithm 2, which are needed for the proof of Theorem 3.

–

In Lemma 13, we quantify the difference of quantized sample under $H_{1}$ and $H_{0}$ .

•

In the proof of Theorem 5, we first decompose the test statistic into two parts,

[TABLE]

where $z^{0}$ is the vector of quantized sample under $H_{0}:g_{0}=0$ . Under Theorem 3, we know the second term is $O_{p}(1)$ . In the first term, it is straightforward to see that $z^{T}Az-(z^{0})^{T}Az^{0}=(z-z^{0})^{T}A(z-z^{0})+2(z-z^{0})^{T}Az^{0}$ .

–

In Lemma 14, we establish a lower bound for $(z-z^{0})^{T}Az^{0}$ .

–

In Lemma 15 and Lemma 16, we establish the lower bound for $(z-z^{0})^{T}A(z-z^{0})$ .

•

In the proof of Theorem 8, observe that the test statistic for each $m$

[TABLE]

is in a quadratic form.

–

Lemma 17 proves that the maximum of the quadratic form follows an extreme value distribution.

–

Lemma 18 provides the rate conditions such that Lemma 17 holds.

•

The idea of Theorem 9 is similar to the proof of Theorem 5 and Theorem 7.

B Notation

In this section, we first summarize some notations which are frequently used through out the paper for the reader’s convenience.

C Useful Lemmas

The proofs of the theorems require some preliminary lemmas. In this section, we summarize these useful lemmas. Throughout the proof, we let $\widetilde{y}_{i}=\frac{1}{\widetilde{n}}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}y_{j},i=1,\ldots,c$ and we denote $\widehat{g}^{\textrm{ss}}$ as the canonical smoothing spline based on the full dataset; $\widehat{g}^{\textrm{ss}}_{c}$ as the smoothing spline based on the averaged responses $\{\widetilde{y}_{1},\cdots,\widetilde{y}_{c}\}$ , and $\widehat{g}^{\textrm{B}}_{\mu,t,c}$ as the desired $B$ -bits estimator, i.e.,

[TABLE]

The following lemma describes that the distance between $\widehat{g}^{\textrm{B}}_{\mu,t,c}$ and $\widehat{g}^{\textrm{ss}}_{c}$ can be well controlled by carefully choosing quantization parameters $\mu,t$ and $c$ .

Lemma 10

For any $\mu=(\mu_{1},\ldots,\mu_{k})^{T}\in\mathbb{R}^{k}$ and $t=(t_{1},\ldots,t_{k-1})^{T}\in\mathbb{R}^{k-1}$ , it holds that

[TABLE]

**Proof ** Recall that $\widehat{g}^{\textrm{B}}_{\mu,t,c}=\sum_{i=1}^{c}\widehat{\theta}_{i}K_{i/c}$ , where $(\widehat{\theta}_{1},\ldots,\widehat{\theta}_{c})^{T}=c^{-1}(\Sigma_{c}+\lambda I_{c})^{-1}z$ with $\Sigma_{c}=[K(i/c,i^{\prime}/c)/c]_{1\leq i,i^{\prime}\leq c}\in\mathbb{R}^{c\times c}$ , $z=(z_{1},\ldots,z_{c})^{T}\in\mathbb{R}^{c}$ , and $K(\cdot,\cdot)$ is the kernel function. Similarly, $\widehat{g}^{\textrm{ss}}_{c}=\sum_{i=1}^{c}\widetilde{\theta}_{i}K_{i/c}$ , where $(\widetilde{\theta}_{1},\ldots,\widetilde{\theta}_{c})^{T}=c^{-1}(\Sigma_{c}+\lambda I_{c})^{-1}\widetilde{y}$ with $\widetilde{y}=(\widetilde{y}_{1},\ldots,\widetilde{y}_{c})^{T}$ . Let $\widehat{\theta}=(\widehat{\theta}_{1},\ldots,\widehat{\theta}_{c})^{T},\widetilde{\theta}=(\widetilde{\theta}_{1},\ldots,\widetilde{\theta}_{c})^{T}$ . By direct calculations, we have

[TABLE]

where $\varphi_{2k-1}(x)=\sqrt{2}\cos(2\pi kx),\,\,\,\,\varphi_{2k}(x)=\sqrt{2}\sin(2\pi kx)$ are the trigonometric basis functions, $\gamma_{2k-1}=\gamma_{2k}=(2\pi k)^{2m}$ , and $\Phi_{\nu}=(\varphi_{\nu}(1/c),\varphi_{\nu}(2/c),\ldots,\varphi_{\nu}(c/c))^{T}$ . So

[TABLE]

We now look at $\Sigma_{c}$ and $\Omega_{c}$ . To ease our calculations, for $0\leq l\leq c-1$ , we first define the following two notations,

[TABLE]

Since $d^{\prime}_{l}=d^{\prime}_{c-l}$ , $d_{l}=d_{c-l}$ for $l=1,2,\ldots,c-1$ , we know $\Sigma_{c},\Omega_{c}$ are both symmetric circulant of order $c$ . Furthermore, $\Sigma_{c}$ and $\Omega_{c}$ share the same normalized eigenvectors as

[TABLE]

where $\varepsilon=\exp(2\pi\sqrt{-1}/c)$ . Let $M=(x_{0},x_{1},\ldots,x_{c-1})$ , and $M^{\ast}$ be the conjugate transpose of $M$ . Clearly, $MM^{\ast}=I_{c}$ and $\Sigma_{c},\Omega_{c}$ admit the following decomposition

[TABLE]

where $\Lambda_{d^{\prime}}=\textrm{diag}(\lambda_{d^{\prime},0},\lambda_{d^{\prime},1},\ldots,\lambda_{d^{\prime},c-1})$ and $\Lambda_{d}=\textrm{diag}(\lambda_{d,0},\lambda_{d,1},\ldots,\lambda_{d,c-1})$ with $\lambda_{d^{\prime},l}=d^{\prime}_{0}+d^{\prime}_{1}\varepsilon^{l}+\ldots+d^{\prime}_{c-1}\varepsilon^{(c-1)l}$ and $\lambda_{d,l}=d_{0}+d_{1}\varepsilon^{l}+\ldots+d_{c-1}\varepsilon^{(c-1)l}$ .

Direct calculations show that

[TABLE]

It is easy to examine that

[TABLE]

where

[TABLE]

It follows from (27) and (28) that $\lambda_{d,l}\leq\lambda_{d^{\prime},l}^{2}$ for $0\leq l\leq c-1$ . Therefore,

[TABLE]

Therefore, it follows by (25) that

[TABLE]

This completes the proof.

Lemma 11

Suppose Condition (B) holds true, and it holds that $C_{k}(t)\rightarrow{0}$ , then we have $\tau_{k}^{2}=Var(z_{1}|H_{0})=O(\widetilde{n}^{-1}+C_{k}(t)^{2})$ and $\widehat{\tau}_{k}^{2}=\frac{\widetilde{\tau}_{n}^{2}}{2\widetilde{n}(n-1)}=\tau_{k}^{2}\left[1+O_{p}(n^{-1/2}+C_{k}(t))\right]=\tau_{k}^{2}[1+o_{p}(1)]$ .

**Proof ** By the definition of $\tau_{k}^{2}$ and (6) we have

[TABLE]

Assume that for $2\leq s\leq k-1,t_{1}<t_{2}<\cdots<t_{s-1}\leq 0<t_{s}<\cdots<t_{k-1}$ and let $p(\epsilon)$ be the density function of $\epsilon_{1}$ . Then we have

[TABLE]

and

[TABLE]

The fact that $\left|\mu_{s}\right|\leq C_{k}(t)$ and the above inequalities lead

[TABLE]

This proves $R_{1}\lesssim\frac{1}{\widetilde{n}}$ . On the other hand, by $t_{1}^{2}>\sigma^{2}$ and $t_{s-1}=O\left(C_{k}(t)\right)=o(1)$ , we have

[TABLE]

This proves $R_{1}\gtrsim\frac{1}{\widetilde{n}}$ , which implies $R_{1}=O(\widetilde{n}^{-1})$ . Using a similar approach, we can prove $R_{2}=O(\widetilde{n}^{-1})$ . From (C), we get $\tau_{k}^{2}=O(\widetilde{n}^{-1}+C_{k}(t)^{2})$ .

Now we prove $\widehat{\tau}_{k}^{2}=\frac{\widetilde{\tau}_{n}^{2}}{2\widetilde{n}(n-1)}=\tau_{k}^{2}[1+o_{p}(1)]$ . By the definition of $\widetilde{\tau}_{n}$ , we have

[TABLE]

where $|\widetilde{\delta}_{i}|=O(C_{k}(t))$ for $i=1,\cdots,n$ . Note that, by the central limit theorem, it holds that

[TABLE]

and that $g_{0}(i/n)-g_{0}((i-1)/n)=O(1/n)$ by the smoothness of function $g_{0}$ , we have that

[TABLE]

which completes the proof.

Lemma 12

Suppose Condition (B) holds true, and it holds that $C_{k}(t)\rightarrow{0}$ . Let $\widehat{\tau}_{k}^{2}$ be the quantied variance based on $y^{\textrm{linear}}$ , then we have that $\widehat{\tau}_{k}^{2}=\frac{\widetilde{\tau}_{n}^{2}}{2\widetilde{n}(n-1)}=\tau_{k}^{2}\left[1+O_{p}(n^{-1/2}+C_{k}(t))\right]=\tau_{k}^{2}[1+o_{p}(1)]$ .

**Proof ** By the definition of $\widehat{\tau}_{k}^{2}$ , we have

[TABLE]

where $|\widetilde{\delta}_{i}|=O(C_{k}(t))$ for $i=1,\cdots,n$ . Note that $\widehat{g}\in\mathcal{L}(\mathbb{I})$ , one has that $|\widehat{y}_{i}-\widehat{y}_{i-1}|=O_{p}(1/n)$ by the smoothness of $\widehat{g}$ . Similar to the proof in Lemma 11, we get $\widehat{\tau}_{k}^{2}=\frac{\widetilde{\tau}_{n}^{2}}{2\widetilde{n}(n-1)}=\tau_{k}^{2}\left[1+O_{p}(n^{-1/2}+C_{k}(t))\right]=\tau_{k}^{2}[1+o_{p}(1)]$ .

To ease calculation, we define some useful notations. Let $z_{i}^{0}$ be the quantized data conditional on $g_{0}(x)=0$ and $z^{0}=(z_{1}^{0},\ldots,z_{c}^{0})^{T}$ . According to (6), we have

[TABLE]

Furthermore, we let $\widetilde{z}^{0}=(\widetilde{z}_{1}^{0},\ldots,\widetilde{z}_{c}^{0})^{T},\widetilde{z}=(\widetilde{z}_{1},\ldots,\widetilde{z}_{c})^{T}$ , where for $i=1,\ldots,c,$

[TABLE]

Lemma 13

Suppose $g$ is the regression function generating the samples. Suppose Condition (B) holds and $\sigma|\epsilon_{j}|+c_{s}\rho\leq\sqrt{{\mathcal{T}}_{n}}$ holds for all $j=1,\ldots,n$ . Then for any $g\in S^{m}(\mathbb{I})$ with $J(g)\leq\rho^{2}$ , it holds that $|\widetilde{z}_{i}-\widetilde{z}_{i}^{0}-f(i/c)|\leq 4C_{k}(t)+\zeta,i=1,\ldots,c$ , where $f$ is the corresponding integral equation defined in (15), $\zeta=\max\limits_{i=1,\ldots,c}|f(i/c)-\frac{1}{\widetilde{n}}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}g(j/n)|=\max\limits_{i=1,\ldots,c}|\frac{1}{2\Delta}\int_{\max(i/c-\Delta,0)}^{\min(i/c+\Delta,1)}g(s)ds-\frac{1}{\widetilde{n}}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}g(j/n)|$ and $\Delta=\frac{1}{c}$ .

Proof : Suppose $\sigma\epsilon_{i}\in R_{j}(t)$ for some $1\leq j\leq k$ . Since $\min\{t_{1}^{2},t_{k-1}^{2}\}={\mathcal{T}}_{n}$ and $c_{s}\rho+\sigma|\epsilon_{i}|\leq\sqrt{{\mathcal{T}}_{n}}$ , we must have $2\leq j\leq k-1$ . Suppose that $g(i/n)+\sigma\epsilon_{i}\in R_{l}(t)$ for some $1\leq l\leq k$ . Since $\min\{t_{1}^{2},t_{k-1}^{2}\}={\mathcal{T}}_{n}$ and by (12) implying $|g(i/n)|\leq c_{s}\rho$ , we have

[TABLE]

Therefore, $2\leq l\leq k-1$ . Since

[TABLE]

we have

[TABLE]

Hence it holds that

[TABLE]

Since $c_{s}\rho+\sigma|\epsilon_{j}|\leq\sqrt{{\mathcal{T}}_{n}}$ for all $j=1,\ldots,n$ , the result follows from (6) and (33) that

[TABLE]

where the last inequality follows from (34) and the definition of $\zeta$ .

Lemma 14

Suppose Condition (B) holds, and $h\rightarrow 0$ , $ch\rightarrow\infty$ . Then for any $g\in S^{m}(\mathbb{I})$ with $J(g)\leq\rho^{2}$ , we have

[TABLE]

where $\zeta=\max\limits_{i=1,\ldots,c}|\frac{1}{2\Delta}\int_{\max(i/c-\Delta,0)}^{\min(i/c+\Delta,1)}g(s)ds-\frac{1}{\widetilde{n}}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}g(j/n)|$ and $\Delta=\frac{1}{c}$ .

Proof : For convenience, let $\omega_{i}=\widetilde{z}_{i}-\widetilde{z}_{i}^{0}$ . From Lemma 13 and the fact that $\widetilde{z}_{i}-\widetilde{z}_{i}^{0}=0$ if $c_{s}\rho+\sigma|\epsilon_{j}|>\sqrt{{\mathcal{T}}_{n}}$ for some $(i-1)\widetilde{n}+1\leq j\leq i\widetilde{n}$ , it holds that

[TABLE]

According to (33) and the fact that

[TABLE]

one has that

[TABLE]

which further implies that

[TABLE]

For any $g\in S^{m}(\mathbb{I})$ with $J(g)\leq\rho^{2}$ , we have

[TABLE]

To complete the proof, we will analyze the above terms $T_{1}$ through $T_{5}$ .

For $T_{1}$ , we have

[TABLE]

where recall $A_{0}=\textrm{diag}(a_{1,1},\ldots,a_{c,c})$ . Since $A\leq I_{c}$ and $a_{1,1}=\cdots=a_{c,c}\asymp 1/(ch)=o(1)$ , we have $A^{2}+A_{0}^{2}\leq 2I_{c}$ (as $c\rightarrow\infty$ ), which, together with (36), further leads to

[TABLE]

For $T_{2}$ , we have

[TABLE]

where the last inequality follows from

[TABLE]

Here the above “ $\lesssim$ ” is uniformly of $1\leq i\leq c$ .

For $T_{3}$ , Cauchy inequality implies that

[TABLE]

where the last inequality follows from $ca_{1,1}^{2}=\ldots=ca_{c,c}^{2}\asymp(ch^{2})^{-1}$ , as $c\rightarrow\infty$ .

For $T_{4}$ , we have

[TABLE]

For $T_{5}$ , it holds that

[TABLE]

From the above analysis of $T_{1}$ through $T_{5}$ , we get that as $c\rightarrow\infty$ , for any $g\in S^{m}(\mathbb{I})$ with $J(g)\leq\rho^{2}$ , it follows that

[TABLE]

This proves the desired result.

For $\nu=1,2,\ldots,c$ , define $\Phi_{\nu}=(\varphi_{\nu}(1/c),\varphi_{\nu}(2/c),\ldots,\varphi_{\nu}(c/c))^{T}$ . Let $\varepsilon=\exp(2\pi\sqrt{-1}/c)$ ,

[TABLE]

and $x_{r}^{\ast}$ be the conjugate transpose of $x_{r}$ .

Lemma 15

For $0\leq r\leq c-1$ and $1\leq v\leq c-1$ , one has that

[TABLE]

and

[TABLE]

Proof : The proof can be accomplished by direct calculations. For instance, the first case holds by following arguments. For $0\leq r\leq c-1$ and $1\leq v\leq c-1$ ,

[TABLE]

The proof of other cases is similar.

Let $M=(x_{0},x_{1},\ldots,x_{c-1})$ and $M^{\ast}\textbf{f}=(e_{0}(f),e_{1}(f),\ldots,e_{c-1}(f))^{T}$ , where $\textbf{f}=(f(1/c),\ldots,f(c/c))^{T}$ . Recall $M^{\ast}$ is the conjugate transpose of $M$ . Suppose $f\in S^{m}(\mathbb{I})$ admits Fourier expansion $f=\sum_{\nu=1}^{\infty}f_{\nu}\varphi_{\nu}$ .

Lemma 16

There exists a universal constant $\varrho>0$ s.t. for any $f\in S^{m}(\mathbb{I})$ ,

[TABLE]

Proof : For simplicity, denote $e_{r}=e_{r}(f)$ . For $1\leq r\leq c/2$ , we have

[TABLE]

Therefore, it follows that

[TABLE]

where $\varrho_{m}^{\prime}=\max\{2,(2(1+2^{1-2m}\bar{d}^{\prime}_{m})+2^{2m}(1+2^{1-2m}\bar{d}^{\prime}_{m})^{2})\pi^{-2m}\}$ , and $\bar{d}^{\prime}_{m}$ is defined in (29).

By Lemma 15 and direct calculations, for $1\leq r\leq c-1$ , we have

[TABLE]

Therefore, it holds that

[TABLE]

It is easy to see that

[TABLE]

For $1\leq r\leq c/2$ , we have

[TABLE]

where (38) follows by an elementary inequality

[TABLE]

Meanwhile, a similar analysis leads to

[TABLE]

Now it follows from (37), (38) and (39), and elementary facts $\lambda_{d^{\prime},r}=\lambda_{d^{\prime},c-r}$ and $\lambda_{d,r}=\lambda_{d,c-r}$ , for $1\leq r\leq c-1$ , that

[TABLE]

where $\varrho_{m}=\max\{\sum_{p=1}^{\infty}(2\pi p)^{-2m}/2,4m\varrho_{m}^{\prime}/(2m-1)\}$ . It is straightforward to see $\varrho_{m}$ is a decreasing function with respect to $m$ , therefore, we choose $\varrho=\varrho_{m=1}$ . This proves Lemma 16.

The proof of Theorem 8 requires some recent Gaussian approximation result, i.e., Theorem 3.1 in Koike (2019).

Lemma 17

For each $c\in\mathbb{N}$ , let $\boldsymbol{\Psi}_{c}$ be an $c$ -dimensional centered Gaussian vector with covariance matrix $\Sigma_{c}=\left(\Sigma_{c}(m,m^{\prime})\right)_{1\leq m,m^{\prime}\leq c}$ and $m_{u}\geq 2$ be an integer. Also, for each $m=m_{l},\ldots,m_{u}$ , let $A_{m}$ be an $c\times c$ symmetric matrix and $Z_{c}=\left(Z_{c,m_{l}},\ldots,Z_{c,m_{u}}\right)^{\top}$ be an $m_{u}-m_{l}+1$ -dimensional centered Gaussian vector with covariance matrix $\mathfrak{C}_{c}=\left(\mathfrak{C}_{c}(m,m^{\prime})\right)_{m_{l}\leq m,m^{\prime}\leq m_{u}}.$ Set $F_{c,m}:=\boldsymbol{\Psi}_{c}^{\top}A_{m}\boldsymbol{\Psi}_{c}-E\left[\boldsymbol{\Psi}_{c}^{\top}A_{m}\boldsymbol{\Psi}_{c}\right]$ and suppose that the following conditions are satisfied:

There is a constant $b>0$ such that $\mathfrak{C}_{c}(m,m)\geq b$ for every $c$ and every $m=m_{l},\ldots,m_{u}.$ 2. 2.

$\max_{m_{l}\leq m\leq m_{u}}\left(E\left[F_{c,m}^{4}\right]-3E\left[F_{c,m}^{2}\right]^{2}\right)\log^{6}m_{c}\rightarrow 0$ * as $c\rightarrow\infty$ .* 3. 3.

$\max_{m_{l}\leq m,m^{\prime}\leq m_{u}}\left|\mathfrak{C}_{c}(m,m^{\prime})-E\left[F_{c,m}F_{c,m^{\prime}}\right]\right|\log^{2}m_{c}\rightarrow 0$ * as $c\rightarrow\infty$ .*

Then we have

[TABLE]

**Proof ** This is Theorem 3.1 in Koike (2019).

The proof of Theorem 9 requires some rate conditions which are summarized in the following lemma.

Lemma 18

Suppose $\lambda_{m}=a_{n}^{2m}n^{-4m/(4m+1)}\log(m_{u})^{2m/(4m+1)}$ , then for any $m_{l}<m<m_{u}\rightarrow\infty$ , under Condition (C), the following rate conditions hold:

[TABLE]

where $h_{m}=\lambda^{\frac{1}{2m}}=a_{n}n^{-2/(4m+1)}\log(m_{u})^{1/(4m+1)}$ .

**Proof ** It is easy to see $h_{m_{l}}<h_{m}<h_{m_{u}}$ . Therefore

[TABLE]

where the last “ $\rightarrow 0$ ” follows from the assumption $m_{u}\lesssim\log^{d_{0}}(n)$ for some $d_{0}\in(0,1/2)$ . For the last two terms, one has that

[TABLE]

D Proofs for main theorems

Proof of Theorem 1:

It holds that $\|\widehat{g}^{\textrm{B}}_{\mu,t,c}-g_{0}\|^{2}\leq 2\|\widehat{g}^{\textrm{B}}_{\mu,t,c}-\widehat{g}^{\textrm{ss}}_{c}\|^{2}+2\|\widehat{g}^{\textrm{ss}}_{c}-g_{0}\|^{2}$ , and we analyze these two terms separately. We first analyze $\|\widehat{g}^{\textrm{B}}_{\mu,t,c}-\widehat{g}^{\textrm{ss}}_{c}\|^{2}$ . Because

[TABLE]

we have

[TABLE]

Therefore, from Lemma 10, we have

[TABLE]

On the other hand, by elementary calculations we have

[TABLE]

where $p(\cdot)$ is the distribution of $\epsilon$ . Combining the above, we get

[TABLE]

Next, we analyze the mean square error of the second term $\|\widehat{g}^{\textrm{ss}}_{c}-g_{0}\|^{2}$ . For the sake of theoretical investigation, we introduce the following function,

[TABLE]

where $(\theta_{\textrm{new},1},\ldots,\theta_{\textrm{new},c})^{T}=c^{-1}(\Sigma_{c}+\lambda I_{c})^{-1}\widetilde{f}$ with $\widetilde{f}=(f(1/c),\ldots,f(c/c))^{T}\in\mathbb{R}^{c}$ , and $f(x)$ is the integral function of $g_{0}$ as defined in (15), i.e.,

[TABLE]

Recall that $\widehat{g}^{\textrm{ss}}_{c}=\sum_{i=1}^{c}\widetilde{\theta}_{i}K_{i/c}=c^{-1}\sum_{\nu=1}^{\infty}\frac{\Phi_{\nu}^{T}(\Sigma_{c}+\lambda I_{c})^{-1}\widetilde{y}}{\gamma_{\nu}}\varphi_{\nu}$ , where $(\widetilde{\theta}_{1},\ldots,\widetilde{\theta}_{c})^{T}=c^{-1}(\Sigma_{c}+\lambda I_{c})^{-1}\widetilde{y}$ with $\widetilde{y}=(\widetilde{y}_{1},\ldots,\widetilde{y}_{c})^{T}$ , $\varphi_{2k-1}(x)=\sqrt{2}\cos(2\pi kx),\,\,\,\,\varphi_{2k}(x)=\sqrt{2}\sin(2\pi kx)$ are the trigonometric basis functions, $\gamma_{2k-1}=\gamma_{2k}=(2\pi k)^{2m}$ , and $\Phi_{\nu}=(\varphi_{\nu}(1/c),\varphi_{\nu}(2/c),\ldots,\varphi_{\nu}(c/c))^{T}$ . Therefore, we have

[TABLE]

where $\widetilde{g}=(\widetilde{g}_{1},\ldots,\widetilde{g}_{c})^{T}$ and $\widetilde{g}_{i}=\frac{\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}g_{0}(j/n)}{\widetilde{n}},i=1,\ldots,c$ . Next, we evaluate $E(\|\widehat{g}^{\textrm{ss}}_{c}-E(\widehat{g}^{\textrm{ss}}_{c})\|^{2})$ . Note that $\Sigma_{c},\Omega_{c}$ can be decomposed as $\Sigma_{c}=M\Lambda_{d^{\prime}}M^{\ast},\,\,\Omega_{c}=M\Lambda_{d}M^{\ast}$ , as defined in (26). Furthermore, we let $\widetilde{y}^{0}=(\widetilde{y}^{0}_{1},\ldots,\widetilde{y}^{0}_{c})^{T}$ , where $\widetilde{y}^{0}_{i}=\frac{1}{\widetilde{n}}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}\sigma\epsilon_{j}$ . Hence, we obtain

[TABLE]

By expressions of $\lambda_{d,l}$ ’s, the above is upper bounded by the following

[TABLE]

where $b_{m}\geq 1$ is a constant only depending on $m$ . From the above analysis, we obtain

[TABLE]

Using above analysis and (41), we have

[TABLE]

Now, we consider the difference between original regression function $g_{0}$ and the integral function $f$ defined in (15), i.e., $\|f-g_{0}\|^{2}$ . By definition, for $t\in[\Delta,1-\Delta]$ , there exists $t^{\prime}$ between $t-\Delta$ and $t+\Delta$ such that

[TABLE]

On the other hand, for $t\in[0,\Delta]$ , there exists $t^{\prime}$ between [math] and $t+\Delta$ such that

[TABLE]

In a similar way, we obtain $f(t)-g_{0}(t)\leq\frac{\Delta}{4}g^{\prime}_{0}(t^{\prime})$ for $t\in[t-\Delta,1]$ and some $t^{\prime}\in[t-\Delta,1]$ . Therefore, by Sobolev inequality, we know $\int^{1}_{0}|g_{0}^{\prime\prime}(t)|^{2}dt\leq\int^{1}_{0}|g_{0}^{(m)}(t)|^{2}dt<\infty$ and $\int^{1}_{0}|g_{0}^{\prime}(t)|^{2}dt\leq\int^{1}_{0}|g_{0}^{(m)}(t)|^{2}dt<\infty$ , which implies

[TABLE]

In the end, because both $g_{\textrm{new}}$ and $f$ belong to Sobolev space, and $g_{\textrm{new}}$ can be viewed as the approximate error of spline estimates with respect to $f$ without random error. By classical spline theory ((Wahba, 1990)), we know

[TABLE]

As a consequence, from (42), (D), and (44), we have $E\|\widehat{g}^{\textrm{ss}}_{c}-g_{0}\|^{2}\leq 3E(\|\widehat{g}^{\textrm{ss}}_{c}-g_{\textrm{new}}\|^{2})+3\|g_{\textrm{new}}-f\|^{2}+3\|f-g_{0}\|^{2}=O\left((nh)^{-1}+c^{-3}+c^{-2m}+\lambda\right)$ . Combining the result in (40), we get the desired result.

Proof of Corollary 2: Because as $|t_{1}|\to\infty$ ,

[TABLE]

The first term in the above equation is bounded by $n^{-2m/(2m+1)}$ because $p(z)$ satisfies $\int_{|z|\geq T}z^{2}p(z)dz=O(\exp(-T^{d}))$ and $d\geq\frac{4m}{2m+1}$ , $|t_{1}|\asymp\sqrt{\log(n)}$ . Due to Condition (B), we know $G_{c,k,1}(t)=O(n^{-2m/(2m+1)})$ . Similarly, we know $G_{c,k,2}(t)=O(n^{-2m/(2m+1)})$ . Hence $G_{c,k}(t)=C_{k}(t)^{2}+G_{c,k,1}(t)+G_{c,k,2}(t)=O(n^{-2m/(2m+1)})$ . The result follows by Theorem 1 and $\lambda\asymp n^{-2m/(2m+1)}$ , $c\asymp n^{\frac{\max\{1,2m/3\}}{2m+1}}$ .

Proof of Theorem 3: Suppose $z^{\star}_{i}$ ’s are the quantized samples corresponding to $\mu_{j}=\mu^{\star}_{j}$ for $1\leq j\leq k$ , where $\mu_{j}^{\star}$ are defined by

[TABLE]

For $p>0$ , define the $p$ th order moment of the standardized $z^{\star}_{i}$ :

[TABLE]

where $E_{H_{0}}$ denotes the expectation under $H_{0}$ and $\tau^{\star 2}_{k}=Var(z^{\star}_{i}|H_{0})$ . Because $|\mu_{j}-\mu_{j}^{\star}|\leq C_{k}(t)$ for $j=2,\ldots,k-1$ , and under Condition (B), we have that $\tau_{k}^{\star 2}=O(cn^{-1})$ . Furthermore, since $b\gg\log_{2}\left(\sqrt{n{\mathcal{T}}_{n}h^{1/2}}\right)$ , which implies that $C_{k}(t)^{4}\asymp\frac{{\mathcal{T}}_{n}^{2}}{2^{4b}}\ll\frac{{\mathcal{T}}_{n}^{2}}{n^{2}h{\mathcal{T}}^{2}_{n}}=(n^{2}h)^{-1}=o(c^{2}n^{-2})$ and the assumption that $E([\widetilde{n}^{-1}\sum_{j=1}^{\widetilde{n}}Q(\epsilon_{j})]^{4})=O(c^{2}n^{-2})$ , one has that $m_{p}=O(1)$ for $p=3,4$ .

Define $z^{\textrm{sd}}_{i}=z^{\star}_{i}/\tau^{\star}_{k}$ for $i=1,\ldots,c$ . Then $z^{\textrm{sd}}_{i}$ are iid variables with zero-mean and unit variance. Define $z^{\star}=(z^{\star}_{1},\ldots,z^{\star}_{c})^{T}$ and $z^{\textrm{sd}}=(z^{\textrm{sd}}_{1},\ldots,z^{\textrm{sd}}_{c})^{T}$ . Define $A_{0}=\textrm{diag}(a_{1,1},\ldots,a_{c,c})$ and $A_{1}=A-A_{0}$ . Let $B=A_{1}/s_{c}$ . Define $\alpha_{l}=\frac{\lambda_{d,l}}{(\lambda+\lambda_{d^{\prime},l})^{2}},l=0,\ldots,c-1$ . Immediately, for all $i=1,\ldots,c$ , $a_{i,i}=c^{-1}\sum_{l=0}^{c-1}\alpha_{l}\asymp 1/(ch)$ , therefore,

[TABLE]

where the last “ $\asymp$ ” follows from condition $(ch)^{-1}=o(1)$ . This implies that $s_{c}^{2}\asymp h^{-1}$ . Furthermore,

[TABLE]

Let $T^{\star}_{\mu^{\star},t,c}$ be the test statistic corresponding to $z^{\star}_{i}$ ’s. By (25) it can be shown that $cT^{\star}_{\mu^{\star},t,c}=z^{\star T}Az^{\star}$ , which leads to that

[TABLE]

We first look at $Q_{1}$ . By (46) we have

[TABLE]

which leads to $Q_{1}=o_{P}(1)$ .

Define $b_{i,i}=0$ for $i=1,\ldots,c$ and $B=[b_{i,i^{\prime}}]_{1\leq i,i^{\prime}\leq c}$ . We next analyze $Q_{2}$ . Note that $Q_{2}=(z^{\textrm{sd}})^{T}Bz^{\textrm{sd}}$ . Let $(\widetilde{z}^{\textrm{sd}}_{1},\ldots,\widetilde{z}^{\textrm{sd}}_{c})^{T}$ be an independent copy of $z^{\textrm{sd}}=(z^{\textrm{sd}}_{1},\ldots,z^{\textrm{sd}}_{c})^{T}$ . Let $I$ be uniform distributed on $\{1,2,\ldots,c\}$ . Throughout, we let $\widetilde{z}^{\textrm{sd}}_{i}$ , $z^{\textrm{sd}}_{i}$ and $I$ be mutually independent. Define $\widetilde{z}^{\textrm{sd}}=(z^{\textrm{sd}}_{1},\ldots,z^{\textrm{sd}}_{I-1},\widetilde{z}^{\textrm{sd}}_{I},z^{\textrm{sd}}_{I+1},\ldots,z^{\textrm{sd}}_{c})^{T}$ . So $(z^{\textrm{sd}},\widetilde{z}^{\textrm{sd}})$ is an exchangeable pair (see Reinert and Röllin (2009)), and $\widetilde{z}^{\textrm{sd}}=z^{\textrm{sd}}+e_{I}(\widetilde{z}^{\textrm{sd}}_{I}-z^{\textrm{sd}}_{I})$ , where $e_{j}=(0,\ldots,0,1,0,\ldots,0)^{T}$ with 1 being at the $j$ th position for $j=1,\ldots,c$ . Let $Q_{2}^{\prime}=((\widetilde{z}^{\textrm{sd}})^{T}B\widetilde{z}^{\textrm{sd}}$ . By a simple calculation it can be shown that $Q_{2}^{\prime}-Q_{2}=(\widetilde{z}^{\textrm{sd}})^{T}B\widetilde{z}^{\textrm{sd}}-(z^{\textrm{sd}})^{T}Bz^{\textrm{sd}}=2(\widetilde{z}^{\textrm{sd}}_{I}-z^{\textrm{sd}}_{I})e_{I}^{T}Bz^{\textrm{sd}}$ . So it follows that

[TABLE]

Let $g^{\star}_{0}:\mathbb{R}\rightarrow[0,1]$ be a $C^{3}$ -function such that $g^{\star}_{0}(s)=1$ for $s\leq 0$ and $g^{\star}_{0}(s)=0$ for $s\geq 1$ . Let $G_{u}(s)=g^{\star}_{0}(\psi_{c}(s-u))$ for $u\in\mathbb{R}$ , where $\psi_{c}$ is a positive sequence tending to infinity and satisfying

[TABLE]

The existence of such $\psi_{c}$ follows by (46).

Next we will approximate $E\{G_{u}(Q_{2})-G_{u}(V)\}$ where $V\sim N(0,1)$ . Consider Stein’s equation

[TABLE]

where $\dot{g}$ and $\ddot{g}$ represent first- and second-order derivatives of $g$ . By Goldstein and Rinott (1996), a solution to (50) is

[TABLE]

Let $C_{1}=\|\dot{g}^{\star}_{0}\|_{\sup}$ , $C_{2}=\|\ddot{g}^{\star}_{0}\|_{\sup}$ , and $C_{3}=\|\dddot{g}^{\star}_{0}\|_{\sup}$ , where $\dddot{g}^{\star}_{0}$ is the third-order derivative of $g^{\star}_{0}$ . It is easy to see that

[TABLE]

Clearly, it holds that $\|\ddot{g}\|_{\sup}\leq\|\ddot{G}_{u}\|_{\sup}\leq C_{2}\psi_{c}^{2}$ and $\|\dddot{g}\|_{\sup}\leq\|\dddot{G}_{u}\|_{\sup}\leq C_{3}\psi_{c}^{3}$ .

By exchangeability, $\frac{1}{2}E\{(Q_{2}^{\prime}-Q_{2})(\dot{g}(Q_{2}^{\prime})+\dot{g}(Q_{2}))\}=0$ . So $E\{(Q_{2}^{\prime}-Q_{2})\dot{g}(Q_{2})\}+\frac{1}{2}E\{(Q_{2}^{\prime}-Q_{2})(\dot{g}(Q_{2}^{\prime})-\dot{g}(Q_{2}))\}=0$ . Since $E\{(Q_{2}^{\prime}-Q_{2})\dot{f}(Q_{2})\}=E\{E\{Q_{2}^{\prime}-Q_{2}|w\}\dot{g}(Q_{2})\}=-\frac{2}{c}E\{Q_{2}\dot{g}(Q_{2})\}$ , we have

[TABLE]

Next, we analyze $J_{1}$ and $J_{2}$ separately. Let $M_{p}=E\{(z^{\textrm{sd}}_{i})^{p}\}$ for $p\geq 1$ . For $J_{1}$ , by direct examinations we have

[TABLE]

where $D_{i}=(1+(z^{\textrm{sd}}_{i})^{2})(e_{i}^{T}Bz^{\textrm{sd}})^{2}$ . Since $\sum_{i=1}^{c}E\{D_{i}\}=1$ , we get that

[TABLE]

The first term of (D) is equal to

[TABLE]

where the last “ $\asymp$ ” follows by (47).

The second term of (D) is equal to

[TABLE]

We have that

[TABLE]

where $N_{1}=(\sum_{l\neq i,i^{\prime}}b_{i,l}z^{\textrm{sd}}_{l})^{2}$ , $N_{2}=2\sum_{l\neq i,i^{\prime}}b_{i,l}z^{\textrm{sd}}_{l}b_{i,i^{\prime}}z^{\textrm{sd}}_{i^{\prime}}$ , $N_{3}=b_{i,i^{\prime}}^{2}(z^{\textrm{sd}}_{i^{\prime}})^{2}$ , $N_{1}^{\prime}=(\sum_{l\neq i,i^{\prime}}b_{i^{\prime},l}z^{\textrm{sd}}_{l})^{2}$ , $N_{2}^{\prime}=2\sum_{l\neq i,i^{\prime}}b_{i^{\prime},l}z^{\textrm{sd}}_{l}b_{i^{\prime},i}z^{\textrm{sd}}_{i}$ , $N_{3}^{\prime}=b_{i^{\prime},i}^{2}(z^{\textrm{sd}}_{i})^{2}$ . By direct calculations, it is easy to see that

[TABLE]

Therefore, it can be shown that

[TABLE]

The last inequality holds because each term in the summation is bounded by $\textrm{trace}(B^{4})$ multiplied by suitable constants.

Since $B=(A-A_{0})/s_{c}$ , we have $B^{2}\leq 2(A^{2}+A_{0}^{2})/s_{c}^{2}$ and $B^{4}\leq 8(A^{4}+A_{0}^{4})/s_{c}^{4}$ . So it holds that

[TABLE]

where the last inequality follows from the trivial fact $\textrm{trace}(A^{4})\geq\sum_{i=1}^{c}a_{i,i}^{4}$ . From the above analysis, we get that

[TABLE]

For $J_{2}$ , it holds that

[TABLE]

By (47), (54) and $s_{c}^{2}\asymp h^{-1}$ , we have

[TABLE]

By (49) the following holds uniformly for $u\in\mathbb{R}$ :

[TABLE]

Similarly, for $\widetilde{G}_{u}(s)=g^{\star}_{0}(\psi_{n}(s-u)+1)$ , it can be shown that the following statement holds uniformly for $u\in\mathbb{R}$ :

[TABLE]

By elementary facts, we have

[TABLE]

By (55), (56) and (D), the following statements hold uniformly for $u\in\mathbb{R}$ ,

[TABLE]

Hence, as $c$ tends to infinity,

[TABLE]

This, together with $Q_{1}=o_{P}(1)$ , proves

[TABLE]

Let $z_{i}$ ’s and $T_{\mu,t,c}$ be the quantized samples and testing statistics in Theorem 3, then we have

[TABLE]

We will analyze these two terms separately. For $R_{1}$ , one has that

[TABLE]

where $\Delta^{\star}=(\Delta^{\star}_{1},\ldots,\Delta^{\star}_{c})^{T}$ with $\Delta^{\star}_{i}=z_{i}-z^{\star}_{i}$ which satisfies $E(\Delta^{\star 2}_{i})\leq C_{k}^{2}(t)+1/n$ under Condition B.

For the first term, since $\|\Delta^{\star}\|^{2}\leq O_{p}\left(cC_{k}(t)^{2}+cn^{-1}\right)$ and $A\leq I_{c}$ , it follows that

[TABLE]

where the last equality follows from the condition $b\gg\log_{2}\left(\sqrt{n{\mathcal{T}}_{n}h^{1/2}}\right)$ , which implies that

[TABLE]

For the second term in (59), using the fact that $(\Delta^{\star}_{i},z_{i}^{\star})^{T}$ , $(\Delta^{\star}_{j},z_{j}^{\star})^{T}$ are independent if $i\neq j$ , and $Ez^{\star}=0$ , it is straightforward to show that

[TABLE]

In the proof of (47), we have shown that $a_{ii}\asymp(ch)^{-1}$ , $\textrm{trace}(A)\asymp\textrm{trace}(A^{2})\asymp h^{-1}$ , thence we have that

[TABLE]

which implies that

[TABLE]

Furthermore, since $A\leq I_{c}$ , we have that

[TABLE]

which implies that

[TABLE]

and consequently,

[TABLE]

Using the condition $b\gg\log_{2}\left(\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}\right)$ , and the fact that $h\to 0$ , one has that

[TABLE]

which gives that $\frac{2\Delta^{\star T}Az^{\star}}{s_{c}\widehat{\tau}_{k}^{2}}=o_{p}(1)$ . Plugging this back to equation (59), we have that $R_{1}=o_{p}(1)$ .

Now we analyze $R_{2}$ , by Lemma 11, we have

[TABLE]

Since $k\gg\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}$ , one has that

[TABLE]

From (58), we get the desired result.

Proof of Proposition 4: Suppose $p_{\sigma}(\cdot)$ is the density of $\sigma\epsilon_{1}$ . By direct calculations, we have

[TABLE]

For the first term, under Condition (B), we know $E\{Q(\sigma\epsilon_{1})^{2}\}=O(1)$ , which implies $3\sigma^{4}(\frac{1}{\widetilde{n}^{2}}-\frac{1}{\widetilde{n}^{3}})E^{2}\{Q(\sigma\epsilon_{1})^{2}\}=O(c^{2}n^{-2})$ . For the second term, we have that

[TABLE]

Since $C_{k}(t)^{4}=o(1)$ , $E\{\epsilon_{1}^{4}\}=O(nc^{-1})$ and $\mu_{j}^{4}P(\sigma\epsilon_{1}\in R_{j}(t))=O(nc^{-1})$ for $j=1,k$ , one has that $\frac{\sigma^{2}E\{Q(\sigma\epsilon_{1})^{4}\}}{\widetilde{n}^{3}}=O(c^{2}n^{-2})$ . Plugging this back to equation (61), we get the desired result.

Proof of Theorem 5: Without loss of generality, we only consider the case $g_{*}(x)=0$ in (2). By Condition (B), we have that $\min\{t_{1}^{2},t_{k-1}^{2}\}>4c_{s}^{2}\rho^{2}$ , as $n\rightarrow\infty$ . Consider the following event:

[TABLE]

It is easy to show that $P(\mathcal{E}_{1})\rightarrow 1$ as $n\rightarrow\infty$ under Condition (B). Thus, we choose $N_{\eta}^{\prime}$ s.t. $P(\mathcal{E}_{1})\geq 1-\eta/3$ if $c\geq N_{\eta}^{\prime}$ .

Throughout the proof, we suppose that $g\in S_{\rho}^{m}(\mathbb{I})$ is the function that generates the samples and $f$ is the integral function of $g$ defined in (15). Let $\omega_{i}=\widetilde{z}_{i}-\widetilde{z}_{i}^{0},\omega=(\omega_{1},\ldots,\omega_{c})^{T}$ . It is straightforward to see that $\omega_{i}={z}_{i}-{z}_{i}^{0}$ under event $\mathcal{E}_{1}$ . Because

[TABLE]

it follows by Lemma 14 that there exists $N^{\prime\prime}$ s.t., when $c\geq N^{\prime\prime}$ , the following equation holds

[TABLE]

Consider the event

[TABLE]

where $C_{\eta}^{\prime}=\sqrt{24/\eta}$ .

Then

[TABLE]

which implies that $P\left(\mathcal{E}_{2}\right)\geq 1-\eta/3$ .

Let $\widehat{\tau}_{k,0}^{2}$ be the estimated variance under the null. Then one has that

[TABLE]

Since $k\gg\sqrt{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}$ , one has that

[TABLE]

It follows from Theorem 3 that

[TABLE]

Hence, there exists $C_{\eta}^{\prime\prime}>0$ s.t. $P(\mathcal{E}_{3})\geq 1-\eta/3$ for all $c\geq N_{\eta}^{\prime}$ and $N^{\prime\prime}$ , where

[TABLE]

Let $\mathcal{E}=\mathcal{E}_{1}\cap\mathcal{E}_{2}\cap\mathcal{E}_{3}$ , then $P(\mathcal{E})\geq 1-\eta$ for any $c\geq N_{\eta}^{\prime}$ and $N^{\prime\prime}$ .

Suppose $g\in S_{\rho}^{m}(\mathbb{I})$ satisfies $\|g\|_{c}\geq C_{\eta}\delta_{*}$ , where

[TABLE]

where $\tau_{k}^{2}=O(\widetilde{n}^{-1})$ , $\zeta=\max\limits_{i=1,\ldots,c}\big{|}f(i/c)-\frac{1}{\widetilde{n}}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}g(j/n)\big{|}=O(n^{-1})$ .

It follows from Lemma 13 that, on $\mathcal{E}$ , $|\omega_{i}-f(i/c)|\leq 4C_{k}(t)+\zeta$ . Since $A\leq I_{c}$ , we get that

[TABLE]

which, together with Lemma 16, leads to that

[TABLE]

Therefore, on $\mathcal{E}$ , we have

[TABLE]

where (D) follows from $C_{\eta}>12$ (see (64)), i.e.,

[TABLE]

which leads to

[TABLE]

and (67) follows from (64), i.e.,

[TABLE]

Then for any $g\in S_{\rho}^{m}(\mathbb{I})$ satisfying $\|g\|_{c}\geq C_{\eta}\delta_{*}$ , where $C_{\eta},\delta_{*}$ are defined in (64) (65), there exist $N_{\eta}\equiv\max\{N_{\eta}^{\prime},N^{\prime\prime}\}$ such that for any $c\geq N_{\eta}$ , we have

[TABLE]

In the end, since $h=\lambda^{1/(2m)},nh^{1/2}C_{k}(t)^{2}=o(1),ch\rightarrow\infty,\zeta=O(n^{-1})$ , immediately, one has that $\|g\|_{c}\geq C_{\eta}\delta_{*}$ is equivalent as $\|g\|_{c}\geq C_{\eta}\delta_{n,c,\lambda}$ . This proves the desired result.

Proof of Theorem 6: Suppose $g=\beta x+\alpha$ is the “true” function under $H^{\textrm{linear}}_{0}$ and $y_{i}=g(i/n)+\sigma\epsilon_{i}$ , $i=1,\ldots,n$ . We use $\widehat{g}$ to denote the least-square estimator of $g$ based on $Q(y_{i})$ ’s. Consider the following two events:

[TABLE]

It is easy to show that $P(\mathcal{E}_{1}\cap\mathcal{E}_{2})\rightarrow 1$ , as $n\rightarrow\infty$ . Since $\min\{t_{1}^{2},t_{k-1}^{2}\}={\mathcal{T}}_{n}>4c_{s}^{2}\rho^{2}$ as $n\rightarrow\infty$ , under event $\mathcal{E}_{1}\cap\mathcal{E}_{2}$ , for $j=1,\ldots,c$ , one has that

[TABLE]

Furthermore, we have

[TABLE]

where $\varsigma_{i}=\frac{\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}Q\big{(}g_{0}(j/n)-\widehat{y}_{j}\big{)}}{\widetilde{n}}$ satisfying $\varsigma_{i}=O_{p}(1/n+C_{k}(t))$ , and equations (D), (69) follow from (6), (33). Let $z^{0}=(z_{1}^{0},\ldots,z_{c}^{0})^{T}$ , $\varsigma=(\varsigma_{1},\ldots,\varsigma_{c})^{T}$ . Therefore, the test statistic

[TABLE]

where $\overrightarrow{C_{k}(t)}=(C_{k}(t),\ldots,C_{k}(t))^{T}$ . Now we proceed to prove that $cT_{\textrm{linear}}$ is dominated by $T_{1}$ . Using the fact that $z_{i}^{0}$ , $z_{j}^{0}$ are independent of each other if $i\neq j$ , and $E(\{z_{i}^{0}\}^{2})=O(\tau_{k}^{2})=O(c/n),E(\{z_{i}^{0}\}^{4})=O(c^{2}/n^{2})$ , then for the first term $T_{1}$ , it is straightforward to show that

[TABLE]

In the proof to achieve equation (47), we have shown that $a_{ii}\asymp(ch)^{-1}$ , $\textrm{trace}(A)\asymp\textrm{trace}(A^{2})\asymp h^{-1}$ , thence we have that

[TABLE]

Furthermore, since $A\leq I_{c}$ , we have that

[TABLE]

Since $C_{k}(t)^{2}\asymp\frac{{\mathcal{T}}_{n}}{2^{{2b}}}\ll\frac{{\mathcal{T}}_{n}}{(nh^{1/2}+n(ch)^{-1}){\mathcal{T}}_{n}}\leq\frac{{\mathcal{T}}_{n}}{(nh+n(ch)^{-1}){\mathcal{T}}_{n}}=(nh+n(ch)^{-1})^{-1}$ , one has that

[TABLE]

Together with the fact that $cn^{-1}\ll c(nh)^{-1}$ and equation (70), one has that

[TABLE]

Similarly, it can be shown that $T_{4}\ll T_{1}$ . Therefore, $cT_{\textrm{linear}}\asymp T_{1}$ . The dominated term $T_{1}$ in $cT_{\textrm{linear}}$ is nothing but $cT_{\mu,t,c}$ for testing $H_{0}:g_{0}(x)=0$ based on $z^{0}$ . Therefore, in keep with Lemma 11, the limiting distribution of $T_{\textrm{linear}}$ under $H_{0}^{\textrm{linear}}$ should have the same limiting distribution as $T_{\mu,t,c}$ under $H_{0}:g_{0}(x)=0$ . Thus, according to Theorem 3 and Lemma 12, the result is proved.

Proof of Theorem 7: The proof of Theorem 7 is similar to Theorem 5. Let $g$ be the function which generates the observations and $\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g)=\operatorname*{arg\,min}_{f\in\mathcal{L}(\mathbb{I})}\|g-f\|^{2}$ be the projection of $g(\cdot)$ to $\mathcal{L}(\mathbb{I})$ . We further define $f(x)$ be the integral function associated with $g-\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g)$ , as defined in (15), that is,

[TABLE]

Therefore, $\|g-\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g)\|_{c}=\sqrt{\sum_{i=1}^{c}f^{2}(i/c)/c}$ . To proceed, we first define $\widehat{g}^{\star}$ be the least squared estimator based on $Q\big{(}\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g)(j/n)+\sigma\epsilon_{j}\big{)}$ which satisfies $|\widehat{g}(i/n)-\widehat{g}^{\star}(i/n)|\leq C_{k}(t)$ . Let $z^{\textrm{linear}}_{i,0}$ be the $z^{\textrm{linear}}_{i}$ based on $\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g)$ . According to (6), onw has that

[TABLE]

Let $z^{\textrm{linear}}_{0}=(z^{\textrm{linear}}_{1,0},\ldots,z^{\textrm{linear}}_{c,0})^{T}$ . Before proceeding, we first define some notations to ease the calculations. Define

[TABLE]

Similar to Lemma 13, we want to find an upper bound of $|\widetilde{z}^{\textrm{linear}}_{i}-\widetilde{z}^{\textrm{linear}}_{i,0}|$ . It is straightforward to show that

[TABLE]

where

[TABLE]

Suppose that Condition (B) holds, and consider events

[TABLE]

It is easy to show that $P(\mathcal{E}^{*}_{1}\cap\mathcal{E}^{*}_{2}\cap\mathcal{E}^{*}_{3})\rightarrow 1$ as $n\rightarrow\infty$ . Let $\mathcal{E}^{*}=\mathcal{E}^{*}_{1}\cap\mathcal{E}^{*}_{2}\cap\mathcal{E}^{*}_{3}$ . Thus, we choose $N_{\eta}^{\prime}$ s.t. $P(\mathcal{E}^{*})\geq 1-\eta/3$ if $n\geq N_{\eta}^{\prime}$ . Define $\omega^{\textrm{linear}}=(\omega^{\textrm{linear}}_{1},\ldots,\omega^{\textrm{linear}}_{c})^{T}$ , where $\omega^{\textrm{linear}}_{i}=\widetilde{z}^{\textrm{linear}}_{i}-\widetilde{z}^{\textrm{linear}}_{i,0}$ . Obviously, $\omega^{\textrm{linear}}_{i}=z^{\textrm{linear}}_{i}-z^{\textrm{linear}}_{i,0}$ under event $\mathcal{E}^{*}$ . Therefore, by (71), under event $\mathcal{E}^{*}$ , we have $|\omega^{\textrm{linear}}_{i}-f(i/c)|\leq 8C_{k}(t)+\zeta$ . Since $A\leq I_{c}$ , we get that

[TABLE]

where $\textbf{f}=(f(1/c),\ldots,f(c/c))^{T}$ . By Lemma 16 and (72), we get the following lower bound:

[TABLE]

Using a similar argument in Lemma 14, and the facts that $Var(z^{\textrm{linear}}_{i,0})=Var(z^{0}_{i})(1+o_{p}(1))=\tau_{k}^{2}(1+o_{p}(1))$ , $C_{k}(t)^{2}\ll\tau_{k}^{2}$ , as $n,c\rightarrow\infty$ , one has that

[TABLE]

Therefore, there exists $N^{\prime\prime}$ s.t., when $c\geq N^{\prime\prime},P(\mathcal{E}_{2})\geq 1-\eta/3$ , where event $\mathcal{E}_{2}$ is defined as

[TABLE]

and $C_{\eta}^{\prime}=\sqrt{24/\eta}$ .

From Theorem 6, it is straightforward to show that

[TABLE]

Thus, there exists $C_{\eta}^{\prime\prime}>0$ s.t. $P(\mathcal{E}_{3})\geq 1-\eta/3$ for all $c\geq N_{\eta}^{\prime}$ and $N^{\prime\prime}$ , where

[TABLE]

Then $P(\mathcal{E}^{*}\cap\mathcal{E}_{2}\cap\mathcal{E}_{3})\geq 1-\eta$ for any $c\geq N_{\eta}^{\prime}$ and $N^{\prime\prime}$ .

Suppose $g\in S_{\rho}^{m}(\mathbb{I})$ satisfies

[TABLE]

where

[TABLE]

Then, under event $\mathcal{E}^{*}\cap\mathcal{E}_{2}\cap\mathcal{E}_{3}$ , we have

[TABLE]

where (76) follows from $C_{\eta}>192$ (see (75)), i.e.,

[TABLE]

which leads to

[TABLE]

and (77) follows from (75), i.e.,

[TABLE]

Then for any $c\geq N_{\eta}\equiv\max\{N_{\eta}^{\prime},N^{\prime\prime}\}$ , we have

[TABLE]

In the end, by direct calculations, we know that $\|g-\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g)\|_{c}\geq C_{\eta}\delta^{\star}_{\textrm{linear}}$ is equivalent as $\|g-\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g)\|_{c}\geq C_{\eta}\delta_{n,c,\lambda}^{\textrm{linear}}$ . This proves the desired result.

Proof of Theorem 8: Let $\epsilon=(\epsilon_{1},\ldots,\epsilon_{n})^{T}\sim N(0,I_{n})$ and $\widetilde{y}^{0}=(\widetilde{y}_{1}^{0},\ldots,\widetilde{y}_{c}^{0})^{T}$ , where $\widetilde{y}_{i}^{0}=\frac{\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}\sigma\epsilon_{j}}{\widetilde{n}}$ follows a normal distribution. We further define $\varsigma_{i}=z_{i}^{0}-\widetilde{y}^{0}_{i}$ be the difference of $z^{0}_{i}$ and $\widetilde{y}^{0}_{i}$ . Let $\varsigma=(\varsigma_{1},\ldots,\varsigma_{c})^{T}$ . Consider event

[TABLE]

Then $P(\mathcal{E}_{1})\rightarrow 1$ as $n\rightarrow\infty$ , and under event $\mathcal{E}_{1}$ , $|\widetilde{y}_{i}^{0}-\frac{1}{\widetilde{n}}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}Q(\sigma\epsilon_{j})|\leq C_{k}(t)$ . According to (33), one has that $|\varsigma_{i}|=O_{p}(C_{k}(t))$ . Note that for any given $m_{l}\leq m\leq m_{u}\rightarrow\infty$ , the standardized testing statistic

[TABLE]

For $J_{1}$ , notice $\varsigma^{T}A_{m}\varsigma\leq\sum_{i=1}^{c}\varsigma_{i}^{2}=O_{p}(cC_{k}(t)^{2})$ , and $C_{k}(t)^{2}\ll(nh_{m}^{1/2})^{-1}$ for any $m_{l}\leq m\leq m_{u}$ , one has $J_{1}=o_{p}(1)$ . For $J_{2}$ , using a similar argument as (60), we have

[TABLE]

For $J_{3}$ , we need to use Lemma 17. Let $\widetilde{A}_{m}=A_{m}/s_{c,m},\boldsymbol{\Psi}_{c}=\widetilde{n}^{-1/2}(\widetilde{y}^{0}_{1},\ldots,\widetilde{y}^{0}_{c})^{T}$ . Define $F_{c,m}:=\boldsymbol{\Psi}_{c}^{\top}\widetilde{A}_{m}\boldsymbol{\Psi}_{c}-E\left[\boldsymbol{\Psi}_{c}^{\top}\widetilde{A}_{m}\boldsymbol{\Psi}_{c}\right]$ . Define $Z_{c}=\left(Z_{c,m_{l}},\ldots,Z_{c,m_{u}}\right)^{\top}$ be an $m_{u}-m_{l}+1$ -dimensional centered Gaussian vector with covariance matrix $\mathfrak{C}=I_{m_{u}-m_{l}+1}$ Next we need to verify the conditions in Lemma 17.

By direct calculations, we have $E\left[F_{c,m}^{4}\right]-3E\left[F_{c,m}^{2}\right]^{2}=48\textrm{trace}(\widetilde{A}^{4}_{m})\asymp\frac{h_{m}}{s^{4}_{c,m}}\asymp h^{3}_{m}.$ Then we have $\max_{m_{l}\leq m\leq m_{u}}\left(E\left[F_{c,m}^{4}\right]-3E\left[F_{c,m}^{2}\right]^{2}\right)\log^{6}m_{u}\rightarrow 0$ .

On the other hand,

[TABLE]

For the first term in (78), recall that $s_{c,m}^{2}=2\sum_{1\leq i\neq i^{\prime}\leq c}a_{i,i^{\prime}}^{2}$ with $a_{i,i^{\prime}}$ being the $(i,i^{\prime})$ th entry of $A_{m}$ . Then by Lemma 18, we have,

[TABLE]

For the second term in (78), we need to find a bound of $|E\left[F_{c,m}F_{c,m^{\prime}}\right]|$ for $m^{\prime}>m$ . It follows that

[TABLE]

Therefore, using Lemma 18, we have

[TABLE]

Together with (79), we have $\max_{m_{l}\leq m,m^{\prime}\leq m_{u}}\left|\mathfrak{C}(m,m^{\prime})-E\left[F_{c,m}F_{c,m^{\prime}}\right]\right|\log^{2}m_{u}\rightarrow 0$ .

Therefore, by Lemma 17, we have

[TABLE]

By Hall (1979), we know $C_{n}(\max_{m_{l}\leq m\leq m_{u}}Z_{c,m}-C_{n})$ follows an extreme value distribution. Proof is complete.

Proof of Theorem 9: The proof of Theorem 9 is similar to Theorem 5 and Theorem 7. We use the same notations as in the proof of Theorem 5. Suppose $g\in S_{\rho}^{m_{*}}(\mathbb{I})$ is the function which generates the samples and $f$ is the corresponding integral function as defined in (15). We consider the following three events as defined in the proof of Theorem 5.

[TABLE]

Since $P(\mathcal{E}_{1})\rightarrow 1$ as $c\rightarrow\infty$ , there exist $N_{\eta}^{\prime}>0$ , such that $P(\mathcal{E}_{1})\geq 1-\eta/3$ for all $c\geq N_{\eta}^{\prime}$ . Follows from Lemma 14 that there exists $N^{\prime\prime}$ s.t., when $c\geq N^{\prime\prime}$ , $P(\mathcal{E}_{2})\geq 1-\eta/3$ . Furthermore, using Theorem 3, there exists $C_{\eta}^{\prime\prime}>0$ s.t. $P(\mathcal{E}_{3})\geq 1-\eta/3$ for all $c\geq\max\{N_{\eta}^{\prime},N^{\prime\prime}\}$ .

Suppose $g\in S_{\rho}^{m_{*}}(\mathbb{I})$ satisfies $\|g\|_{c}\geq C_{\eta}\delta_{*}$ , where

[TABLE]

Since $m_{u}\rightarrow\infty,m_{*}\leq m_{u}$ eventually. So we assume $m_{*}\leq m_{u}$ . Then it holds that

[TABLE]

Similar to the proof of Theorem 5, we know with probability approaching one

[TABLE]

Since $C_{n}\asymp(\log(m_{u}))^{1/2}$ we have

[TABLE]

Therefore, for any $c\geq N_{\eta}\equiv\max\{N_{\eta}^{\prime},N^{\prime\prime}\}$ , we have

[TABLE]

In the end, by direct calculations, we know that $\|g\|_{c}\geq C_{\eta}\delta_{*}$ is equivalent as $\|g\|_{c}\geq C_{\eta}\delta_{n,c,a_{n}}$ . This proves the desired result.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Adams and Fournier (2003) Robert A Adams and John JF Fournier. Sobolev spaces , volume 140. Elsevier, 2003.
2Benhenni and Rachdi (2006) K Benhenni and Mustapha Rachdi. Nonparametric estimation of the regression function from quantized observations. Computational Statistics & Data Analysis , 50(11):3067–3085, 2006.
3Boufounos and Baraniuk (2008) Petros T Boufounos and Richard G Baraniuk. 1-bit compressive sensing. Information Sciences and Systems, 2008. CISS 2008. 42nd Annual Conference on Information Sciences and Systems , pages 16–21, 2008.
4Cai and Wei (2021) Tony Cai and Hongji Wei. Distributed nonparametric function estimation: Optimal rate of convergence and cost of adaptation. ar Xiv preprint ar Xiv:2107.00179 , 2021.
5Cheng and Shang (2015) Guang Cheng and Zuofeng Shang. Joint asymptotics for semi-nonparametric regression models with partially linear structure. The Annals of Statistics , 43(3):1351–1390, 2015.
6Goldstein and Rinott (1996) Larry Goldstein and Yosef Rinott. Multivariate normal approximations by stein’s method and size bias couplings. Journal of Applied Probability , 33(1):1–17, 1996.
7Gopi et al. (2013) Sivakant Gopi, Praneeth Netrapalli, Prateek Jain, and Aditya Nori. One-bit compressed sensing: Provable support and vector recovery. In International Conference on Machine Learning , pages 154–162, 2013.
8Gu (2013) Chong Gu. Smoothing Spline ANOVA Models , volume 297. Springer Science & Business Media, 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Nonparametric Inference under B-bits Quantization

Abstract

1 Introduction

2 Methodology

2.1 Review of Classical Smoothing Spline Regression

2.2 Two-Stage Quantization

2.3 BBB-bits Nonparametric Spline Estimation

2.4 BBB-bits Nonparametric Testing

2.5 Practical Choice of ccc and kkk Given BBB

3 Asymptotic Theory

3.1 Estimation Convergence Rate

Theorem 1

Corollary 2

3.2 Asymptotic Distribution of the Test Statistic under H0H_{0}H0​

Theorem 3

Proposition 4

3.3 Asymptotic Power of the Nonparametric Test

Theorem 5

4 Extensions

4.1 Nonparametric Testing for Linearity of g0(⋅)g_{0}(\cdot)g0​(⋅)

Theorem 6

Theorem 7

4.2 Adaptive Nonparametric Test When mmm is Unknown

Theorem 8

Theorem 9

5 Simulation Studies

5.1 Estimation Performance of g^μ,t,cB(⋅)\widehat{g}^{\textrm{B}}_{\mu,t,c}(\cdot)g​μ,t,cB​(⋅)

5.2 Nonparametric Test with g∗(⋅)≡0g_{*}(\cdot)\equiv 0g∗​(⋅)≡0 and m=2m=2m=2

5.3 Adaptive Nonparametric Test with an Unknown mmm

5.4 Nonparametic Linearity Test with m=2m=2m=2

6 Real Data Analysis

7 Discussion

A Structure of the proofs

B Notation

C Useful Lemmas

Lemma 10

Lemma 11

Lemma 12

Lemma 13

Lemma 14

Lemma 15

Lemma 16

Lemma 17

Lemma 18

D Proofs for main theorems

2.3 $B$ -bits Nonparametric Spline Estimation

2.4 $B$ -bits Nonparametric Testing

2.5 Practical Choice of $c$ and $k$ Given $B$

3.2 Asymptotic Distribution of the Test Statistic under $H_{0}$

4.1 Nonparametric Testing for Linearity of $g_{0}(\cdot)$

4.2 Adaptive Nonparametric Test When $m$ is Unknown

5.1 Estimation Performance of $\widehat{g}^{\textrm{B}}_{\mu,t,c}(\cdot)$

5.2 Nonparametric Test with $g_{*}(\cdot)\equiv 0$ and $m=2$

5.3 Adaptive Nonparametric Test with an Unknown $m$

5.4 Nonparametic Linearity Test with $m=2$