Nonparametric Inference under B-bits Quantization
Kexuan Li, Ruiqi Liu, Ganggang Xu, Zuofeng Shang

TL;DR
This paper introduces a nonparametric testing method for quantized data, demonstrating its asymptotic properties and effectiveness through simulations and real data, especially when the number of bits exceeds a certain threshold.
Contribution
It proposes a computationally efficient nonparametric testing procedure for B-bit quantized samples with theoretical guarantees and extensions to linearity and adaptive tests.
Findings
Test statistic achieves classical minimax rate when B exceeds threshold
Method is effective for spline models and nonparametric linearity testing
Simulation and real-data studies confirm validity and effectiveness
Abstract
Statistical inference based on lossy or incomplete samples is often needed in research areas such as signal/image processing, medical image storage, remote sensing, signal transmission. In this paper, we propose a nonparametric testing procedure based on samples quantized to bits through a computationally efficient algorithm. Under mild technical conditions, we establish the asymptotic properties of the proposed test statistic and investigate how the testing power changes as increases. In particular, we show that if exceeds a certain threshold, the proposed nonparametric testing procedure achieves the classical minimax rate of testing (Shang and Cheng, 2015) for spline models. We further extend our theoretical investigations to a nonparametric linearity test and an adaptive nonparametric test, expanding the applicability of the proposed methods. Extensive simulation studies…
| Symbol | Description |
|---|---|
| number of groups. | |
| number of observations in each group which is defined as . | |
| quantized value. | |
| cut-off points of quantized intervals. | |
| vector of response . | |
| average of the response which is defined as . | |
| vector of quantized sample. | |
| vector of quantized sample under . | |
| vector of truncated quantized sample, | |
| where . | |
| vector of truncated quantized response under , | |
| where . | |
| new defined data for testing the linearity of , which is defined as | |
| , and is the least-square estimator of . | |
| vector of quantized value of . | |
| quantized value of under | |
| smoothing parameter. | |
| trigonometric basis functions. | |
| kernel function. | |
| kernel matrix defined as . | |
| “tensor” of defined as . | |
| . | |
| approximation error of Riemann sum and integral. | |
| Sobolev constant defined as . | |
| maximum length of quantization interval. |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Distributed Sensor Networks and Detection Algorithms
Nonparametric Inference under B-bits Quantization
\nameKexuan Li \[email protected]
\addrGlobal Analytics and Data Sciences
Biogen Inc
Cambridge, MA 02142 USA \AND\nameRuiqi Liu \[email protected]
\addrDepartment of Mathematics and Statistics
Texas Tech University
Lubbock, TX 79409, USA \AND\nameGanggang Xu \[email protected]
\addrDepartment of Management Science
University of Miami
Coral Gables, FL 33146, USA \AND\nameZuofeng Shang \[email protected]
\addrDepartment of Mathematical Sciences
New Jersey Institute of Technology
Newark, NJ 07102, USA
Abstract
Statistical inference based on lossy or incomplete samples is often needed in research areas such as signal/image processing, medical image storage, remote sensing, signal transmission. In this paper, we propose a nonparametric testing procedure based on samples quantized to bits through a computationally efficient algorithm. Under mild technical conditions, we establish the asymptotic properties of the proposed test statistic and investigate how the testing power changes as increases. In particular, we show that if exceeds a certain threshold, the proposed nonparametric testing procedure achieves the classical minimax rate of testing (Shang and Cheng, 2015) for spline models. We further extend our theoretical investigations to a nonparametric linearity test and an adaptive nonparametric test, expanding the applicability of the proposed methods. Extensive simulation studies together with a real-data analysis are used to demonstrate the validity and effectiveness of the proposed tests.
Keywords: B-bits Quantization, Minimax Rates of Testing, Nonparametric Inference, Smoothing Splines
1 Introduction
Lossy or incomplete data are commonly encountered in research areas such as machine learning, information theory, and signal processing. To store and process signals in digital devices, quantization is a popular procedure that maps the original measurements from a large (often uncountably infinite) set to a set of possible values. The resulting values are referred to as the quantized samples. With the increasing availability of data, it is of great interest to quantify how the data analysis can be affected when the data are quantized due to storage or communication budget constraint, and how to design quantization schemes to minimize the efficiency loss. Statistical inference based on quantized samples is challenging because, in addition to the measurement errors, one also needs to account for the information loss due to the quantization errors. In particular, commonly used standard statistical procedures may not be valid when applied to quantized samples if the quantization errors are ignored.
The research on lossy data has attracted increasing attention recently. The first line of works focuses on -bit compressive sensing, which aims at reconstructing a sparse signal from a sequence of -bit quantized outcomes. A -bit compressive sensing model was proposed by Boufounos and Baraniuk (2008), and several efficient and provable algorithms have been developed; see, e.g., Gupta et al. (2010); Gopi et al. (2013); Plan and Vershynin (2013); Zhang et al. (2014); Zhu and Gu (2015). A signal recovery algorithm was proposed in Slawski and Li (2015), which extended the -bit compressive sensing model to a -bit compressive sensing model. The second line of research related to the lossy data is to develop statistical methods based on quantized observations. For example, Lee and Vardeman (2001) studied the interval estimation of a normal mean process from rounded data, which was further extended to more general likelihood-based statistical estimation problems (Vardeman and Lee, 2005) and nonparametric regression problems (Benhenni and Rachdi, 2006). Recently, an increasing number of works aim to quantify the impact of quantization on the statistical properties of the resulting estimators. For example, Zhang et al. (2013) established lower bounds on the minimax risks for distributed estimation of parametric models under a communication budget constraint. Suresh et al. (2017) proposed communication efficient algorithms for distributed mean estimation without probabilistic assumptions on the data. A version of Pinsker’s theorem under some storage or communication constraints was developed in Zhu and Lafferty (2014), and it was further applied to analyze the convergence rate of nonparametric estimation with a limited bits budget by Zhu and Lafferty (2017). More recently, a series of works have emerged in investigating the high-dimensional and/or nonparametric regression model estimation in the distributed learning framework with bits constraints, e.g., see Zhu and Lafferty (2018); Han et al. (2018); Szabo and van Zanten (2020); Cai and Wei (2021).
Despite the abundant existing literature on statistical modeling of quantized data, research focusing on the nonparametric inference based on quantized data is still lacking. This paper aims to fill this gap by proposing a new quantization scheme with a -bits storage or communication budget such that nonparametric estimation and testing based on quantized samples are still valid. Specifically, we consider the following regression model
[TABLE]
where is a smooth function, ’s are iid zero-mean errors with an unit variance, and is an unknown constant. The goal is to (a) estimate , and (b) test the following hypothesis
[TABLE]
where is a pre-specified deterministic function.
The above model has been extensively studied in the literature, see, e.g., Shang and Cheng (2017), and is closely related to the well-known Gaussian sequence model and Gaussian white noise model (Tsybakov, 2008). However, unlike existing literature, we consider the case in which the original data, denoted by , are generated in machine M, and are quantized as soon as they are generated. The quantized data are then stored in a machine M or transmitted to another machine M∗ for future statistical inferences. We assume that only -bits budget are available for data storage or communication, rendering the necessity for data quantization that may invalidate existing estimation and inference methods. Such a research problem is important for applications where data generation and analysis are carried out at different locations. For example, testing reveals whether the transmitted quantized signals through satellite are pure noises. If is the signal-process from a normally functioning machine, testing (2) using only quantized samples enables us to remotely monitor whether the machine is working properly in real-time.
To meet the -bits requirement, we propose a two-stage quantization procedure: in the first stage we quantize an individual as with being a quantizer, and in the second stage we overwrite these quantized observations by their local averages. See Figure 1 and Algorithm 1 for details. As a result, we obtain a quantized sample of size for some to be stored or transmitted. We demonstrate that with a carefully chosen and a well-designed quantizer, the proposed nonparametric estimation and testing procedures are asymptotically valid and efficient even based only on the quantized data.
Our contributions can be summarized as follows. Firstly, we propose a computationally efficient data quantization algorithm to reduce the size of the raw data to meet the -bits constraint, and at the same time reduce the computational complexity from to . Secondly, we establish sufficient conditions on the bits constraint, i.e., , that warrants the minimax convergence rate for the resulting spline estimators and the minimax rates of testing for the proposed testing procedure. In particular, our results show how the asymptotic power of the proposed testing procedure changes as the bits constraint increases. Thirdly, we further extend our theoretical investigations to (a) a nonparametric linearity test of the underlying function; (b) an adaptive nonparametric test when the smoothness of the underlying function is unknown. To the best of our knowledge, our work is the first to provide a theoretical investigation on nonparametric inference based on quantized samples.
The rest of the paper is organized as follows. Section 2 describes the general methodologies we proposed for data quantization, nonparametric estimation, and nonparametric testing using splines. In Section 3, we investigate the theoretical properties of the spline estimator and the nonparametric test statistic based on quantized samples. In Section 4, we study asymptotic properties of the nonparametric linearity test statistic and the adaptive nonparametric test statistic under B-bits constraint. Section 5 gives several simulation studies to evaluate finite sample performances of the proposed methods and Section 6 illustrates an application of the proposed methods to the Combined Cycle Power Plant Data.
Notation: Let represent the -norm, i.e., , and define as the Euclidean Norm of vectors. Let denote the supreme norm of a function, i.e., . For two positive sequences and , we denote () if there exists a constant such that () for all ; denote if and ; denote if as and if as .
2 Methodology
In this section, we first review some background of the classical smoothing spline regression and then give details on the proposed quantization scheme, nonparametric estimation and testing procedures.
2.1 Review of Classical Smoothing Spline Regression
Throughout this paper, we assume that the underlying true function belongs to the -order () periodic Sobolev space on defined as
[TABLE]
where are the trigonometric basis functions, and for and . It follows from Wahba (1990) and Gu (2013) that is a reproducing kernel Hilbert space (RKHS) endowed with an inner product and a reproducing kernel
[TABLE]
where is the Bernoulli polynomial of order .
Based on the above assumptions on , the classic smoothing spline (ss) estimator of is obtained through the following optimization problem:
[TABLE]
For , we can define a function , which belongs to . By the representer Theorem (Gu, 2013), the solution to (3) has the following closed-form
[TABLE]
where with , and being the identify matrix.
To conduct hypothesis test for (2), a straightforward idea is to construct a testing statistic based on the distance between and . Specifically, we use the norm distance defined as
[TABLE]
With an appropriate normalization, it can be shown that is asymptotically normally distributed (Shang and Cheng, 2017; Yang et al., 2020; Liu et al., 2021).
2.2 Two-Stage Quantization
The original observations ’s in (1) are real-valued random variables, each of which literally requires an infinite amount of bits to store or transmit. When there are only available bits, the original observations ’s may not be directly accessible for estimation or testing, and hence, the classical smoothing spline estimator given in (4) is not applicable. This section aims to introduce a two-stage quantization scheme to transform ’s into the ones whose storage or transmission meets the -bits constraint. The resulting samples will be further used for optimal inferential purposes in the subsequent sections. The two-stage quantization process is demonstrated in the following Figure 1.
The first-stage quantization is to quantize the data ’s as soon as they are generated with at most distinct values. For convenience, we use a uniform quantization scheme as follows. We first choose an interval and choose as the equally spaced grid points within . Denote and the sub-interval length . For ease of presentation, we assume that is an integer. Define a quantizer as follows:
[TABLE]
where consists of the quantized values and , are the corresponding quantized intervals. Clearly, the ’s form a partition of the real line with assigned marks ’s and maps each to one of the marks. Applying to ’s, we generate quantized samples , each of which takes at most distinct values. Storage or transmission of ’s thus requires bits which might still go beyond the -bits budget when . For this reason, we propose the following second-stage quantization to further reduce the storage or transmission bits through locally averaging the ’s.
The second-stage quantization is to further reduce the number of storage or transmission bits via local average. Specifically, we divide the interval into equally-spaced sub-intervals for some . For simplicity, we assume that is an integer and each sub-interval contains observations. The quantized data from the first-stage quantization, i.e., ’s, are further quantized as such that
[TABLE]
Details of the two-stage quantization algorithm are provided in the following Algorithm 1.
Based on the definition of the quantizer in (5), in (6) must belong to the interval and must be of the form for some integer . Therefore, there are at most distinct values of ’s, namely, , for . As a result, each requires bits to store or transmit, hence, the entire ’s require bits, where is the smallest integer greater than . In the subsequent sections, we will show that, with being properly selected, optimal inferences based on ’s are possible even under , comparing to other regression literature which typically needs (see Slawski and Li (2018)). For optimal inferences in non-regression settings such as Gaussian sequence model or Gaussian white-noise model, similar findings were made by Cai and Wei (2021).
2.3 -bits Nonparametric Spline Estimation
Given , let us choose such that , i.e., our two-stage quantization maximizes the use of the available bits. Based on the quantized samples from Algorithm 1, a -bits constrained spline estimator is proposed as follows
[TABLE]
Similar to (4), the resulting spline estimator has an explicit expression
[TABLE]
where with , , and being the identity matrix.
Notice that the optimization of (7) only requires on quantized observations and the solution only involves computing the inverse of a matrix , which is much less computationally intensive compared to the classical smoothing spline estimator (4).
Finally, the selection of the tuning parameter is crucial, and can be obtained by minimizing the generalized cross validation (GCV) score as follows
[TABLE]
The GCV has been widely used in the literature and enjoys appealing theoretical properties in various settings, see, e.g., Wahba (1990); Xu and Huang (2012); Gu (2013); Xu et al. (2018, 2019).
2.4 -bits Nonparametric Testing
In this section, we propose a test statistic for the null hypothesis (2) based on the -bits spline estimator . Without loss of generality, we assume in the null hypothesis (2). For a nonzero , the observed response variables ’s from model (1) can be centered as , and the same testing procedure can be applied using ’s instead. To test , we consider test statistic based on the norm distance between and as following
[TABLE]
Intuitively, a large value of should lead to the rejection of . In Theorem 3, we shall show that under and mild conditions, it holds that
[TABLE]
where , , with and with being the th entry of . In practice, needs to be estimated based on the quantized data as well. We proposed the following estimator
[TABLE]
where is given in the following Algorithm 2 through quantization. Intuitively, the above estimator is a re-scaled (by a factor of ) version of the quantized sample error variance . It is straightforward to shown that under mild conditions, see Lemma 11 of Appendix C for details. Consequently, the decision rule for testing (2) at significance level can be defined as follows
[TABLE]
where is the -percentile of the standard normal distribution. We reject the null hypothesis (2) if and only if .
By the design of the quantizer in (5), we can see that there are at most distinct possible values for each ranging from to , , yielding the range for as . Since can only take values as for some integer , there are at most distinct values for , which would cost bits to store or transmit. Compare to the bit costs for the two-stage quantization , the cost to store or transmit is negligible when , hence is ignored in the calculation of the total bit costs for ease of presentation.
2.5 Practical Choice of and Given
The implementations of Algorithms 1 and 2 require a practical choice of and for a given bits budget . Based on the discussion in Section 2.2, Algorithm 1 requires with . Our theoretical investigations in Section 3.3 require that for some , and as , where is defined in Condition (B). Furthermore, equations (16) and (17) in Section 3.3 reveal that the optimal choice of depends on the smoothness of the periodic Sobolev space (i.e., ) and the tuning parameter . While the former is typically unknown in practice, the latter needs to be chosen by some data-driven criterion such as GCV based on the quantized data, which is not available until the quantization process is carried out. To simplify the calculation and make the quantization algorithm more practical, we propose to use , which is a valid choice for any and , and therefore is easy to use in practice. Specifically, given , we find and as follows
[TABLE]
By the definition in Condition (B), is the quantization range, and is used in the choice of so that is invariant if ’s are multiplied by a constant. Under Condition (B), the actual choice of depends on the distribution of ’s in model (1). If ’s follow a standard Gaussian distribution, it suffices to take . Therefore, in (11) does not need to be estimated. See more discussion under Condition (B) regarding the choice of .
3 Asymptotic Theory
We now proceed to study asymptotic properties of the -bits spline estimator and the nonparametric test statistic. In this section, we restrict our investigation to the simple case scenario when the order of the periodic Sobolev space is known and fixed, and the exact form of function in the null hypothesis (2) is also known. We shall defer theoretical results on more general cases to Section 4.
3.1 Estimation Convergence Rate
We first quantify the convergence rate of . Even though the main focus of this paper is conducting statistical inference based on quantized samples, it is still of interest to study the asymptotic properties of the spline estimator . Define the Sobolev constant
[TABLE]
It is known that is positive finite see (Adams and Fournier, 2003).
For all our theoretical investigations, we assume that ’s and ’s satisfy the following boundedness condition
[TABLE]
Condition (B) asserts that the values of can not be to large, and that should be sufficiently large. Recall that in this paper, we adopt the uniform quantization scheme for which Condition (B) is rather mild. Since , by the definition of in (12), we have that , and we shall assume that is finite for our theoretical investigation. Condition (B) essentially assumes that is sufficiently large so that all observed ’s fall within the quantization range with a high probability. When ’s follow a sub-Gaussian distribution, it suffices to take for Condition (B) to hold. For distributions with heavier tails, the required order for will be larger, e.g., for sub-Exponential distributions. In particular, when ’s follow a normal distribution, it suffices to use .
Based on Condition (B), the following theorem establishes an asymptotic upper bound for the estimation error .
Theorem 1
If Condition (B) holds, then it follows that
[TABLE]
*where , and , with *
[TABLE]
with being the distribution of .
The asymptotic error bound for given in Theorem 1 can be roughly categorized into three parts: (1) the estimation error of the smoothing spline estimator based on fully observed original data, i.e., (Wahba, 1990); (2) the estimation error attributed to first-stage quantization, i.e., ; and (3) the estimation bias introduced by second-stage quantization, i.e., . An extreme case is when , and , i.e., the first-stage quantizer becomes dense enough, in which case tends to zero, reducing to the classical nonparametric estimation setting.
Intuitively, if a sufficiently large bits budget , and consequently sufficiently large values and can be used, term will dominate the upper bound of . As a result, the convergence rate of coincides with that of the classical smoothing spline estimator based on original observations without quantization (Wahba, 1990). A sufficient condition is given in the following corollary.
Corollary 2
Assume that Condition (B) holds, and that (1) ; (2) as , satisfies where ; (3) ; and that (4) , . Then it follows that , which achieves the optimal convergence rate of smoothing splines without quantization.
Recall the definition , under conditions of Corollary 2, the minimum order of to achieve the optimal convergence rate is , leading to a required . Therefore, the total bits budget . Recently, Zhu and Lafferty (2018) propose a quantization scheme for the Gaussian sequence model that achieves the same optimal estimation rate with a bits budget . Although their bits budget is lower than our proposed method, Zhu and Lafferty (2018) achieve this goal by essentially only quantizing the first Fourier coefficients of the function and discarding the remaining Fourier coefficients as [math]’s. It is unclear how can this approach be extended to making valid nonparametric inferences for , which is the main focus of our work. The proposed quantization scheme in Section 2.2 is in spirit closer to the quantization algorithms proposed in Slawski and Li (2018) and references therein, although these works are mainly focused on the estimation of the parametric linear regression model. In the following subsections, we shall investigate the impacts of the bits budget on the asymptotic properties of the proposed nonparametric testing procedure.
3.2 Asymptotic Distribution of the Test Statistic under
In this section, we proceed to derive the asymptotic distribution of the test statistic under . From now on, we will use without repeating its definition.
Theorem 3
Suppose that Condition (B) holds, and it holds that , , and as . Then under , it follows that
[TABLE]
where , , and are as defined in Section 2.4.
Theorem 3 states that under some regularity conditions, the null distribution of the nonparametric test statistic for in (2) is asymptotically normal. The proof relies on Stein’s exchangeable pair method and is given in the Appendix.
We remark that the conditions in Theorem 3 are rather mild. Specifically, the first condition requires the tuning parameter to shrink to zero and the second condition implies the number of quantized data, ie., , should be sufficiently large. The only condition that needs more discussion is the last condition , which involves jointly controlling the moment of ’s and the first-stage quantizer . Proposition 4 below provides a sufficient condition to for this assumption.
Proposition 4
Suppose that Condition (B) holds. If , and for and , then it follows that .
Using Theorem 3 and Proposition 4, the validity of the proposed nonparametric testing procedure requires the quantized sample size to be sufficiently large, in particular, . Recall that the proposed quantization scheme in Section 2.2 requires a total bits budget with . As a result, for Theorem 3 to hold, the required bits budget , for which the lower bound is determined by the tuning parameter (or ). In the next subsection, we shall investigate the impacts of on the asymptotic testing power against local alternatives, which can be used to study optimal asymptotic power achievable with a given bits budget . For example, we shall show that to achieve the minimax rate of testing, one needs .
3.3 Asymptotic Power of the Nonparametric Test
We now proceed to examine the asymptotic power of the proposed nonparametric test. For a fixed constant , let be the -ball in the periodic Sobolev space with a radius . We consider the following alternative hypothesis
[TABLE]
Based on the definition of the quantized data in (7), its unquantized counterpart can be defined as for . Under , one has that for . To facilitate our theoretical investigation, we introduce the following function
[TABLE]
It is straightforward to show that and that as , . Theorem 5 below states that, under some regularity conditions, our proposed nonparametric test can achieve arbitrary high power provided that and are sufficiently separated.
Theorem 5
Suppose that Condition (B) holds. If it holds that , , , and , then for any , there exists positive constants and such that for any ,
[TABLE]
where and with function as defined in (15).
The separation rate represents the smallest rate of deviation from the that can be consistently detected by the proposed test statistic (9), given sufficiently large and . The first part of , namely, , coincides with the separation rate of the classical spline-based nonparametric test using original observations without quantization, see, e.g., Shang and Cheng (2013); Cheng and Shang (2015); Shang and Cheng (2015, 2017). The remaining part of , namely, , is an additional term due to the two-stage quantization errors. For a given and , the separation rate can be minimized by choosing an appropriate value of the tuning parameter , subject to the constraint . Specifically, by some straightforward algebra, one can show that
[TABLE]
Recall that the total bits needed for the proposed quantization scheme in Section 2.2 is , for which Theorem 5 requires that and . By plugging the optimal smoothing parameter back to the lower bound of , we have the following
[TABLE]
From (17), we can see that when is sufficiently large, i.e., , the minimal separation rate achieves the minimax rate of testing (Shang and Cheng, 2013, 2017; Liu et al., 2020), implying lossless asymptotic testing power using only quantized samples. In this case, the minimal number of bits for each data point, i.e., , does not depend on but is determined by the smoothness of the function and the tail bound of the error distribution. When is between and , the minimax rate of testing is no longer achievable, but the minimal separation rate still decays polynomially as the original sample size increases. Furthermore, in this intermediate phase of , the lower bound of decreases as increases, implying that increasing rather than when allocating the total bits budget will more effectively improve the testing power. Finally, when is less than , the asymptotic lower bound for the minimal rate of separation is (roughly) of the order with , the number of quantized measurements that can be transmitted or stored, provided that .
4 Extensions
Our prior investigations in Section 3 assume that the hypothesized function in (2) and the order of the periodic Sobolev space are both known. In reality, it might be interesting to test other hypotheses, e.g., whether has a parametric expression such as a linear function. Meanwhile, the order is often unknown. We will extend the prior works to such settings.
4.1 Nonparametric Testing for Linearity of
In some applications, we are interested in testing whether resides in a parametric family. In this section, as an illustrative example, we consider testing the linearity of :
[TABLE]
where denotes the class of liner functions over . Testing the hypothesis that belongs to other parametric families governed by a finite number of parameters can be conducted in the same way with minor modifications.
To test (18), we first obtain the least-square estimator , based on ’s, i.e., . Subsequently, we define the new data as , where . By applying the two-stage quantization Algorithm 1 to , we can then obtain the quantized data . Following the same estimation procedure in Section 2.3, we can obtain a spline estimator based on the quantized data .
The resulting test statistic is then defined as , whose limiting distribution under is given by the following theorem.
Theorem 6
Suppose that Condition (B) holds. If as , it holds that , , , and , then under , one has that
[TABLE]
where , , and are as defined in Section 2.4 but based on .
Theorem 6 is an immediate extension of Theorem 3 to testing the linearity of using only quantized samples, indicating that the proposed nonparametric linearity test is valid under mild conditions. To investigate the power of the proposed linearity test against the alternative , we define the distance between and the linear function space as , where is the projection of to . The magnitude of characterizes how far the true function deviates from any linear function in . Note that under null hypothesis , one has that .
The following theorem describes the asymptotic power of the proposed nonparametric linearity test.
Theorem 7
Suppose that Condition (B) hold. If as , it holds that , , , and , then for any , there exists positive constants and such that for any ,
[TABLE]
where and with function as defined in (15).
Based on Theorem 7, we can see that for a given quantized sample of size , the same separation rate for testing can be achieved by the proposed nonparametric linearity test as described in (16). Furthermore, the proofs of Theorems 6-7 are similar to those of Theorem 3 and Theorem 5 by recognizing the fact that the least square estimator satisfies that , whose impact is negligible for a nonparametric spline estimator. It is therefore trivial to extend Theorems 6-7 to testing whether resides in other parametric families as long as an uniformly root-n consistent parametric estimator is available.
4.2 Adaptive Nonparametric Test When is Unknown
From (16), we can see that the power of the proposed nonparametric test depends crucially on the order of the periodic Sobolev space where the underlying true function resides. However, the order may be unknown in practice. One popular strategy is to set regardless of the underlying truth, which may lead to sub-optimal testing power. In this section, we construct an optimal adaptive nonparametric testing procedure based on quantized samples that doesn’t require .
Let denote the unknown true order of the Sobolev space to which belongs, and assume that is an integer between two known integers and . For instance, one can set and so that, as diverges, is guaranteed to belong to . For any given integer , we can calculate the test statistics defined in (9) with the tuning parameter where may depend on but is free of . We remark that the upper bound may be slowly diverging as . Our adaptive nonparametric testing procedure is summarized as follows.
Step 1. For any , calculate the standardized testing statistic
[TABLE]
where , , and are as defined in Section 2.4.
Step 2. Calculate the maximum of ’s, i.e., .
Step 3. Standardize as following
[TABLE]
where satisfies .
For the validity of the proposed adaptive nonparametric test, we assume that the following Condition (C) holds.
[TABLE]
Condition (C) requires that the searching range for can not be too large by imposing a slowly diverging uppper bound on . In addition, the total number of quantized samples, i.e., , that need to be transmitted or stored can not be too small compared to , and is jointly determined by and the tuning parameter . These conditions are rather mild and have been used in the literature, see, e.g., Liu et al. (2019, 2021). The following theorem describes the asymptotic behavior of under .
Theorem 8
Suppose that both Conditions (B) and (C) hold, , and that
[TABLE]
Then, under given in (2), for any , it holds that
[TABLE]
where .
The intuition behind Theorem 8 is straightforward: under , the limiting distribution of each is normal, which suggests that the asymptotic distribution of the maxima should be close to the extreme value distribution. We use the techniques developed in Koike (2019) to formalize the proof.
Next, we investigate the asymptotic power of the proposed adaptive nonparametric test under the alternative
Theorem 9
Suppose that both Conditions (B) and (C) hold, , and that
[TABLE]
Then, for any , there exists positive constants and such that for any ,
[TABLE]
where and with function as defined in (15).
Based on the form of separation rate obtained in Theorem 9, it is straightforward to show that the minimal separation rate is obtained when for some constant , provided that
[TABLE]
so that Condition (C) is met and the second term inside the square-root part of is negligible. Specifically, if , one has that
[TABLE]
The minimal separation rate (21) is the same as the one obtained in Liu et al. (2019, 2021) and is minimax for the adaptive nonparametric test. This suggests that with the quantized samples, the proposed adaptive test can still achieve the optimal testing power if the bits budget satisfies
[TABLE]
and we take as suggested in Section 2.5. Compared to the minimax rate of testing when is known, which is given in (16), the minimal separation rate (21) is only inflated by a factor of . This is the price to pay for searching over . Furthermore, we wish to remark that the lower bound of the bits budget depends not only on the true order but also on the smallest guess of the order, i.e., . This can be interpreted by the fact that in Step 1 of the adaptive test is constructed based on an under-smoothed spline estimator, which may have a larger order of estimation bias. In practice, it is convenient to set as suggested by Liu et al. (2021). However, a more accurate guess of may lead to a smaller bits budget required to achieve the minimax rate of testing.
5 Simulation Studies
In this section, we evaluate the finite sample performance of the proposed methods through a set of simulation studies. For all simulation settings except for Section 5.4, the data are generated from the following model
[TABLE]
where is the density function of the beta distribution with parameters and , ’s are independent random errors. Two types of errors were considered: (1) ; (2) . We consider from [math] to , and various sample sizes . In particular, is used to examine the empirical size of the proposed test under , and other values of are used to check the empirical powers against alternatives. The target significance level was chosen as .
For all simulation studies, we consider the uniform quantization scheme outlined in Section 2.2. Specifically, for the data quantization step, for a given bits budget , we choose following the approach suggested in Section 2.5 with a . For each simulation, the quantization ranges are defined as , where with being the regression function in model (1). The use of and is of limited importance and can be replaced with any reasonable alternatives such as setting or using estimates based on historical data. Summary statistics from each simulation setting were based on independent simulation runs. Except for Section 5.3, we considered periodic Sobolev space of order with kernel function , where is the Bernoulli polynomial of order . The tuning parameter was set as with being picked by GCV.
5.1 Estimation Performance of
In this section, we first evaluate the estimation performance of the spline estimator defined in (7) that is based on only quantized samples. We generated data from model (22) with and sample sizes . For each , we gradually increase the bits budget from to . The estimation accuracy was evaluated by the mean squared errors defined as . The simulation results were summarized in Figure 2, which suggests that the MSEs decrease as increases in all considered settings. Moreover, as increases, the MSEs first decreases rapidly at the beginning and then stabilize at some levels. This observation is consistent with our theoretical results established in Section 3.1, which state that increasing (or equivalently, and ) will diminish the impact of information loss due to the data averaging and data quantization, and as a result becomes more accurate. Furthermore, we can also observe after exceeds a certain threshold, the MSEs of stabilize, which supports the findings in Corollary 2. Specifically, when is sufficiently large, the MSEs of reaches the estimation error lower bound of the classical spline estimator based on the complete data.
5.2 Nonparametric Test with and
In this section, we investigate the empirical sizes and powers of the nonparametric test proposed in Section 2.4, when in the null hypothesis (2) and treated as known. The data was generated from the model (22) with various and sample sizes .
Figure 3 reports the empirical sizes of the proposed nonparametric test when and the empirical powers when , respectively. Specifically, in all case scenarios, the empirical sizes of the proposed test are close to the target nominal level as the sample size increases. When either or increases, we observe that the empirical powers of the proposed test gradually approach one, which suggests that the proposed testing procedure is consistent for the alternative hypothesis that has a sufficiently large deviation (relative to the sample size ) from the . Furthermore, after the bits budget exceeds a certain threshold, the empirical powers of the proposed nonparametric test are rather close to each other, which supports our theoretical findings in Section 3.3.
5.3 Adaptive Nonparametric Test with an Unknown
In this section, we investigate the validity and the empirical power of the adaptive nonparametric test proposed in Section 4.2, for which the order parameter is searched from to . Figure 4 shows the empirical rejection rates of the proposed nonparametric adaptive test at the significance level. We can observe that when , the empirical rejection rates are rather close to the nominal level . For any given , we can see that the empirical rejection rates increase as the sample size increases. For a fixed , as increases, the empirical rejection rates increase steadily and eventually reach the when and . Finally, as long as the bits budget exceeds a certain threshold, the empirical rejection rates are rather similar in most settings. All these observations are consistent with our theoretical findings in Theorem 9. Furthermore, the empirical rejection rates are smaller than the nonparametric test (non-adaptive) in Section 5.2 under the same setting, which is the price paid for adaptivity in .
5.4 Nonparametic Linearity Test with
In this section, we study the empirical performance of the proposed nonparametric linearity test. The data is generated from the following model
[TABLE]
where is the density function of the beta distribution with parameters and , and ’s are independent random errors. Two types of errors were considered: (1) ; (2) . When , the model satisfies the null hypothesis , and as increases, the departure from the linear model becomes increasingly larger.
Figure 5 reports the empirical rejection rates of the nonparametric linearity test proposed in Section (4.1) at the significance level . It is straightforward to see that, when , the empirical rejection rates are close to the nominal size, indicating the validity of the test asserted by Theorem 6. For any given , we can see that the empirical rejection rates increase as the sample size increases. For a fixed , as increases, the empirical rejection rates increase steadily and eventually reach the . Finally, as long as the bits budget exceeds a certain threshold, the empirical rejection rates are rather similar in most settings. All these observations are consistent with our theoretical findings in Theorem 7.
6 Real Data Analysis
In this section, we apply the proposed methods to the Combined Cycle Power Plant Data (Kaya et al., 2012; Tüfekci, 2014), which can be downloaded at http://https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant. The data set consists of observations from a Combined Cycle Power Plant over 6 years (2006-2011). The purpose of our analysis is to explore the relationship between the net hourly electrical energy output of the plant between three environmental factors: temperature, ambient pressure, and relative humidity.
Figure 6 displays the estimated curve based on -bits quantizations () and full data, for which the periodic spline of order was used. For the quantization step, we choose , where is the standard deviation of the observated data, and are determined by Section 2.5. We can observe that the spline estimator based on quantized data with , i.e., the green curve, is rather different from the other curves in the two analyzes. When the bits budget increases to more than , such differences quickly diminish. This observation demonstrates the effectiveness of the proposed -bits quantization scheme.
Next, we conduct some hypothesis tests for the relationship between the net hourly electrical energy output and other three environmental factors. The first test is to test whether there is an association between the energy output and three environmental factors. We consider both non-adaptive and adaptive nonparametric tests. For the non-adaptive nonparametric test, is used. The p-values are all close to zero, implying strong rejections of the null hypothesis. This is not surprising based on the shapes of the spline estimators illustrated in Figure 6.
Next, there appears to be a strong linear association between relative humidity and the energy output in Figure 6. Based on this conjecture, we proceed to test whether the associations between these three environmental factors and the energy output are linear or nonlinear, using the nonparametric linearity test proposed in Section 4.1. The p-values for the first two environmental factors, i.e., ambient pressure and temperature, are both close to zero, indicating strong rejections of the null hypothesis. Figure 7 illustrates the p-values of the nonparametric linearity test for the relationship between relative humidity and energy output as a function of the bits budget . We can see that the nonparametric linearity test based on quantized data fails to reject the null hypothesis, which echos our conjecture based on Figure 6.
7 Discussion
In this paper, we propose a set of non-parametric testing procedures based on quantized observations, including the non-adaptive nonparametric test, the nonparametric linearity test, and the adaptive nonparametric test. The proposed tests are easy-to-use based on -metric between the quantization spline estimators and the hypothesized function. We investigate the asymptotic validity and testing powers of the proposed tests and show how the asymptotic testing powers changes as the bits budget increases.
In the end, we discuss two additional extensions. First, the present paper only deals with periodic splines. It is interesting to extend our results to more general splines or even kernel ridge regression. The special periodic spline largely reduces the difficulty level of the technical proofs. Indeed, the majority of the proofs can be accomplished by exact calculations based on trigonometric series. For general RKHS, exact calculations may not be possible, and more involved proofs are needed. Second, the nonparametric linearity test can be easily extended to testing general composite null hypotheses such as for some function governed by parameters with a fixed . However, when is diverging as increases, it will be more challenging to investigate the asymptotic behavior of the proposed test statistic and will be an interesting future research topic.
A Structure of the proofs
In this section, we outline the high-level structure of the proofs for the main theorems.
- •
The proof of Theorem 1 is mainly based on Lemma 10.
- –
In Lemma 10, we provide an upper bound for the difference between two smoothing spline estimators.
- •
The proof of Theorem 3 relies on Stein’s exchangeable pair method. Specifically, we first prove that the asymptotic normality of based on ’s, where ’s are the quantized samples corresponding to for , , and
[TABLE]
Next, we prove that
[TABLE]
- –
In Lemma 11 and Lemma 12, we prove the error rate introduced by quantization of variance using Algorithm 2, which are needed for the proof of Theorem 3.
- –
In Lemma 13, we quantify the difference of quantized sample under and .
- •
In the proof of Theorem 5, we first decompose the test statistic into two parts,
[TABLE]
where is the vector of quantized sample under . Under Theorem 3, we know the second term is . In the first term, it is straightforward to see that .
- –
In Lemma 14, we establish a lower bound for .
- –
In Lemma 15 and Lemma 16, we establish the lower bound for .
- •
In the proof of Theorem 8, observe that the test statistic for each
[TABLE]
is in a quadratic form.
- –
Lemma 17 proves that the maximum of the quadratic form follows an extreme value distribution.
- –
Lemma 18 provides the rate conditions such that Lemma 17 holds.
- •
The idea of Theorem 9 is similar to the proof of Theorem 5 and Theorem 7.
B Notation
In this section, we first summarize some notations which are frequently used through out the paper for the reader’s convenience.
C Useful Lemmas
The proofs of the theorems require some preliminary lemmas. In this section, we summarize these useful lemmas. Throughout the proof, we let and we denote as the canonical smoothing spline based on the full dataset; as the smoothing spline based on the averaged responses , and as the desired -bits estimator, i.e.,
[TABLE]
The following lemma describes that the distance between and can be well controlled by carefully choosing quantization parameters and .
Lemma 10
For any and , it holds that
[TABLE]
**Proof ** Recall that , where with , , and is the kernel function. Similarly, , where with . Let . By direct calculations, we have
[TABLE]
where are the trigonometric basis functions, , and . So
[TABLE]
We now look at and . To ease our calculations, for , we first define the following two notations,
[TABLE]
Since , for , we know are both symmetric circulant of order . Furthermore, and share the same normalized eigenvectors as
[TABLE]
where . Let , and be the conjugate transpose of . Clearly, and admit the following decomposition
[TABLE]
where and with and .
Direct calculations show that
[TABLE]
[TABLE]
It is easy to examine that
[TABLE]
[TABLE]
[TABLE]
where
[TABLE]
It follows from (27) and (28) that for . Therefore,
[TABLE]
Therefore, it follows by (25) that
[TABLE]
This completes the proof.
Lemma 11
Suppose Condition (B) holds true, and it holds that , then we have and .
**Proof ** By the definition of and (6) we have
[TABLE]
Assume that for and let be the density function of . Then we have
[TABLE]
and
[TABLE]
The fact that and the above inequalities lead
[TABLE]
This proves . On the other hand, by and , we have
[TABLE]
This proves , which implies . Using a similar approach, we can prove . From (C), we get .
Now we prove . By the definition of , we have
[TABLE]
where for . Note that, by the central limit theorem, it holds that
[TABLE]
and that by the smoothness of function , we have that
[TABLE]
which completes the proof.
Lemma 12
Suppose Condition (B) holds true, and it holds that . Let be the quantied variance based on , then we have that .
**Proof ** By the definition of , we have
[TABLE]
where for . Note that , one has that by the smoothness of . Similar to the proof in Lemma 11, we get .
To ease calculation, we define some useful notations. Let be the quantized data conditional on and . According to (6), we have
[TABLE]
Furthermore, we let , where for
[TABLE]
Lemma 13
Suppose is the regression function generating the samples. Suppose Condition (B) holds and holds for all . Then for any with , it holds that , where is the corresponding integral equation defined in (15), and .
Proof : Suppose for some . Since and , we must have . Suppose that for some . Since and by (12) implying , we have
[TABLE]
Therefore, . Since
[TABLE]
[TABLE]
we have
[TABLE]
[TABLE]
Hence it holds that
[TABLE]
Since for all , the result follows from (6) and (33) that
[TABLE]
where the last inequality follows from (34) and the definition of .
Lemma 14
Suppose Condition (B) holds, and , . Then for any with , we have
[TABLE]
where and .
Proof : For convenience, let . From Lemma 13 and the fact that if for some , it holds that
[TABLE]
According to (33) and the fact that
[TABLE]
one has that
[TABLE]
which further implies that
[TABLE]
For any with , we have
[TABLE]
To complete the proof, we will analyze the above terms through .
For , we have
[TABLE]
where recall . Since and , we have (as ), which, together with (36), further leads to
[TABLE]
For , we have
[TABLE]
where the last inequality follows from
[TABLE]
Here the above “” is uniformly of .
For , Cauchy inequality implies that
[TABLE]
where the last inequality follows from , as .
For , we have
[TABLE]
For , it holds that
[TABLE]
From the above analysis of through , we get that as , for any with , it follows that
[TABLE]
This proves the desired result.
For , define . Let ,
[TABLE]
and be the conjugate transpose of .
Lemma 15
For and , one has that
[TABLE]
and
[TABLE]
Proof : The proof can be accomplished by direct calculations. For instance, the first case holds by following arguments. For and ,
[TABLE]
The proof of other cases is similar.
Let and , where . Recall is the conjugate transpose of . Suppose admits Fourier expansion .
Lemma 16
There exists a universal constant s.t. for any ,
[TABLE]
Proof : For simplicity, denote . For , we have
[TABLE]
Therefore, it follows that
[TABLE]
where , and is defined in (29).
By Lemma 15 and direct calculations, for , we have
[TABLE]
Therefore, it holds that
[TABLE]
It is easy to see that
[TABLE]
For , we have
[TABLE]
where (38) follows by an elementary inequality
[TABLE]
Meanwhile, a similar analysis leads to
[TABLE]
Now it follows from (37), (38) and (39), and elementary facts and , for , that
[TABLE]
where . It is straightforward to see is a decreasing function with respect to , therefore, we choose . This proves Lemma 16.
The proof of Theorem 8 requires some recent Gaussian approximation result, i.e., Theorem 3.1 in Koike (2019).
Lemma 17
For each , let be an -dimensional centered Gaussian vector with covariance matrix and be an integer. Also, for each , let be an symmetric matrix and be an -dimensional centered Gaussian vector with covariance matrix Set and suppose that the following conditions are satisfied:
There is a constant such that for every and every 2. 2.
* as .* 3. 3.
* as .*
Then we have
[TABLE]
**Proof ** This is Theorem 3.1 in Koike (2019).
The proof of Theorem 9 requires some rate conditions which are summarized in the following lemma.
Lemma 18
Suppose , then for any , under Condition (C), the following rate conditions hold:
[TABLE]
[TABLE]
[TABLE]
where .
**Proof ** It is easy to see . Therefore
[TABLE]
[TABLE]
[TABLE]
where the last “” follows from the assumption for some . For the last two terms, one has that
[TABLE]
[TABLE]
D Proofs for main theorems
Proof of Theorem 1:
It holds that , and we analyze these two terms separately. We first analyze . Because
[TABLE]
we have
[TABLE]
Therefore, from Lemma 10, we have
[TABLE]
On the other hand, by elementary calculations we have
[TABLE]
where is the distribution of . Combining the above, we get
[TABLE]
Next, we analyze the mean square error of the second term . For the sake of theoretical investigation, we introduce the following function,
[TABLE]
where with , and is the integral function of as defined in (15), i.e.,
[TABLE]
Recall that , where with , are the trigonometric basis functions, , and . Therefore, we have
[TABLE]
where and . Next, we evaluate . Note that can be decomposed as , as defined in (26). Furthermore, we let , where . Hence, we obtain
[TABLE]
By expressions of ’s, the above is upper bounded by the following
[TABLE]
where is a constant only depending on . From the above analysis, we obtain
[TABLE]
Using above analysis and (41), we have
[TABLE]
Now, we consider the difference between original regression function and the integral function defined in (15), i.e., . By definition, for , there exists between and such that
[TABLE]
On the other hand, for , there exists between [math] and such that
[TABLE]
In a similar way, we obtain for and some . Therefore, by Sobolev inequality, we know and , which implies
[TABLE]
In the end, because both and belong to Sobolev space, and can be viewed as the approximate error of spline estimates with respect to without random error. By classical spline theory ((Wahba, 1990)), we know
[TABLE]
As a consequence, from (42), (D), and (44), we have . Combining the result in (40), we get the desired result.
Proof of Corollary 2: Because as ,
[TABLE]
The first term in the above equation is bounded by because satisfies and , . Due to Condition (B), we know . Similarly, we know . Hence . The result follows by Theorem 1 and , .
Proof of Theorem 3: Suppose ’s are the quantized samples corresponding to for , where are defined by
[TABLE]
For , define the th order moment of the standardized :
[TABLE]
where denotes the expectation under and . Because for , and under Condition (B), we have that . Furthermore, since , which implies that and the assumption that , one has that for .
Define for . Then are iid variables with zero-mean and unit variance. Define and . Define and . Let . Define . Immediately, for all , , therefore,
[TABLE]
where the last “” follows from condition . This implies that . Furthermore,
[TABLE]
Let be the test statistic corresponding to ’s. By (25) it can be shown that , which leads to that
[TABLE]
We first look at . By (46) we have
[TABLE]
which leads to .
Define for and . We next analyze . Note that . Let be an independent copy of . Let be uniform distributed on . Throughout, we let , and be mutually independent. Define . So is an exchangeable pair (see Reinert and Röllin (2009)), and , where with 1 being at the th position for . Let . By a simple calculation it can be shown that . So it follows that
[TABLE]
Let be a -function such that for and for . Let for , where is a positive sequence tending to infinity and satisfying
[TABLE]
The existence of such follows by (46).
Next we will approximate where . Consider Stein’s equation
[TABLE]
where and represent first- and second-order derivatives of . By Goldstein and Rinott (1996), a solution to (50) is
[TABLE]
Let , , and , where is the third-order derivative of . It is easy to see that
[TABLE]
[TABLE]
Clearly, it holds that and .
By exchangeability, . So . Since , we have
[TABLE]
Next, we analyze and separately. Let for . For , by direct examinations we have
[TABLE]
where . Since , we get that
[TABLE]
The first term of (D) is equal to
[TABLE]
where the last “” follows by (47).
The second term of (D) is equal to
[TABLE]
We have that
[TABLE]
where , , , , , . By direct calculations, it is easy to see that
[TABLE]
Therefore, it can be shown that
[TABLE]
The last inequality holds because each term in the summation is bounded by multiplied by suitable constants.
Since , we have and . So it holds that
[TABLE]
where the last inequality follows from the trivial fact . From the above analysis, we get that
[TABLE]
For , it holds that
[TABLE]
[TABLE]
By (49) the following holds uniformly for :
[TABLE]
Similarly, for , it can be shown that the following statement holds uniformly for :
[TABLE]
By elementary facts, we have
[TABLE]
By (55), (56) and (D), the following statements hold uniformly for ,
[TABLE]
Hence, as tends to infinity,
[TABLE]
This, together with , proves
[TABLE]
Let ’s and be the quantized samples and testing statistics in Theorem 3, then we have
[TABLE]
We will analyze these two terms separately. For , one has that
[TABLE]
where with which satisfies under Condition B.
For the first term, since and , it follows that
[TABLE]
where the last equality follows from the condition , which implies that
[TABLE]
For the second term in (59), using the fact that , are independent if , and , it is straightforward to show that
[TABLE]
In the proof of (47), we have shown that , , thence we have that
[TABLE]
which implies that
[TABLE]
Furthermore, since , we have that
[TABLE]
which implies that
[TABLE]
and consequently,
[TABLE]
Using the condition , and the fact that , one has that
[TABLE]
which gives that . Plugging this back to equation (59), we have that .
Now we analyze , by Lemma 11, we have
[TABLE]
Since , one has that
[TABLE]
From (58), we get the desired result.
Proof of Proposition 4: Suppose is the density of . By direct calculations, we have
[TABLE]
For the first term, under Condition (B), we know , which implies . For the second term, we have that
[TABLE]
Since , and for , one has that . Plugging this back to equation (61), we get the desired result.
Proof of Theorem 5: Without loss of generality, we only consider the case in (2). By Condition (B), we have that , as . Consider the following event:
[TABLE]
It is easy to show that as under Condition (B). Thus, we choose s.t. if .
Throughout the proof, we suppose that is the function that generates the samples and is the integral function of defined in (15). Let . It is straightforward to see that under event . Because
[TABLE]
it follows by Lemma 14 that there exists s.t., when , the following equation holds
[TABLE]
Consider the event
[TABLE]
where .
Then
[TABLE]
which implies that .
Let be the estimated variance under the null. Then one has that
[TABLE]
Since , one has that
[TABLE]
It follows from Theorem 3 that
[TABLE]
Hence, there exists s.t. for all and , where
[TABLE]
Let , then for any and .
Suppose satisfies , where
[TABLE]
[TABLE]
where , \zeta=\max\limits_{i=1,\ldots,c}\big{|}f(i/c)-\frac{1}{\widetilde{n}}\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}g(j/n)\big{|}=O(n^{-1}).
It follows from Lemma 13 that, on , . Since , we get that
[TABLE]
which, together with Lemma 16, leads to that
[TABLE]
Therefore, on , we have
[TABLE]
where (D) follows from (see (64)), i.e.,
[TABLE]
which leads to
[TABLE]
and (67) follows from (64), i.e.,
[TABLE]
Then for any satisfying , where are defined in (64) (65), there exist such that for any , we have
[TABLE]
In the end, since , immediately, one has that is equivalent as . This proves the desired result.
Proof of Theorem 6: Suppose is the “true” function under and , . We use to denote the least-square estimator of based on ’s. Consider the following two events:
[TABLE]
It is easy to show that , as . Since as , under event , for , one has that
[TABLE]
Furthermore, we have
[TABLE]
where \varsigma_{i}=\frac{\sum_{j=(i-1)\widetilde{n}+1}^{i\widetilde{n}}Q\big{(}g_{0}(j/n)-\widehat{y}_{j}\big{)}}{\widetilde{n}} satisfying , and equations (D), (69) follow from (6), (33). Let , . Therefore, the test statistic
[TABLE]
where . Now we proceed to prove that is dominated by . Using the fact that , are independent of each other if , and , then for the first term , it is straightforward to show that
[TABLE]
In the proof to achieve equation (47), we have shown that , , thence we have that
[TABLE]
Furthermore, since , we have that
[TABLE]
[TABLE]
Since , one has that
[TABLE]
Together with the fact that and equation (70), one has that
[TABLE]
Similarly, it can be shown that . Therefore, . The dominated term in is nothing but for testing based on . Therefore, in keep with Lemma 11, the limiting distribution of under should have the same limiting distribution as under . Thus, according to Theorem 3 and Lemma 12, the result is proved.
Proof of Theorem 7: The proof of Theorem 7 is similar to Theorem 5. Let be the function which generates the observations and be the projection of to . We further define be the integral function associated with , as defined in (15), that is,
[TABLE]
Therefore, . To proceed, we first define be the least squared estimator based on Q\big{(}\mathcal{P}_{\mathcal{L}(\mathbb{I})}(g)(j/n)+\sigma\epsilon_{j}\big{)} which satisfies . Let be the based on . According to (6), onw has that
[TABLE]
Let . Before proceeding, we first define some notations to ease the calculations. Define
[TABLE]
Similar to Lemma 13, we want to find an upper bound of . It is straightforward to show that
[TABLE]
where
[TABLE]
Suppose that Condition (B) holds, and consider events
[TABLE]
It is easy to show that as . Let . Thus, we choose s.t. if . Define , where . Obviously, under event . Therefore, by (71), under event , we have . Since , we get that
[TABLE]
where . By Lemma 16 and (72), we get the following lower bound:
[TABLE]
Using a similar argument in Lemma 14, and the facts that , , as , one has that
[TABLE]
Therefore, there exists s.t., when , where event is defined as
[TABLE]
and .
From Theorem 6, it is straightforward to show that
[TABLE]
Thus, there exists s.t. for all and , where
[TABLE]
Then for any and .
Suppose satisfies
[TABLE]
where
[TABLE]
[TABLE]
Then, under event , we have
[TABLE]
where (76) follows from (see (75)), i.e.,
[TABLE]
[TABLE]
which leads to
[TABLE]
and (77) follows from (75), i.e.,
[TABLE]
Then for any , we have
[TABLE]
In the end, by direct calculations, we know that is equivalent as . This proves the desired result.
Proof of Theorem 8: Let and , where follows a normal distribution. We further define be the difference of and . Let . Consider event
[TABLE]
Then as , and under event , . According to (33), one has that . Note that for any given , the standardized testing statistic
[TABLE]
For , notice , and for any , one has . For , using a similar argument as (60), we have
[TABLE]
For , we need to use Lemma 17. Let . Define . Define be an -dimensional centered Gaussian vector with covariance matrix Next we need to verify the conditions in Lemma 17.
By direct calculations, we have Then we have .
On the other hand,
[TABLE]
For the first term in (78), recall that with being the th entry of . Then by Lemma 18, we have,
[TABLE]
For the second term in (78), we need to find a bound of for . It follows that
[TABLE]
Therefore, using Lemma 18, we have
[TABLE]
Together with (79), we have .
Therefore, by Lemma 17, we have
[TABLE]
By Hall (1979), we know follows an extreme value distribution. Proof is complete.
Proof of Theorem 9: The proof of Theorem 9 is similar to Theorem 5 and Theorem 7. We use the same notations as in the proof of Theorem 5. Suppose is the function which generates the samples and is the corresponding integral function as defined in (15). We consider the following three events as defined in the proof of Theorem 5.
[TABLE]
Since as , there exist , such that for all . Follows from Lemma 14 that there exists s.t., when , . Furthermore, using Theorem 3, there exists s.t. for all .
Suppose satisfies , where
[TABLE]
Since eventually. So we assume . Then it holds that
[TABLE]
Similar to the proof of Theorem 5, we know with probability approaching one
[TABLE]
Since we have
[TABLE]
Therefore, for any , we have
[TABLE]
In the end, by direct calculations, we know that is equivalent as . This proves the desired result.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Adams and Fournier (2003) Robert A Adams and John JF Fournier. Sobolev spaces , volume 140. Elsevier, 2003.
- 2Benhenni and Rachdi (2006) K Benhenni and Mustapha Rachdi. Nonparametric estimation of the regression function from quantized observations. Computational Statistics & Data Analysis , 50(11):3067–3085, 2006.
- 3Boufounos and Baraniuk (2008) Petros T Boufounos and Richard G Baraniuk. 1-bit compressive sensing. Information Sciences and Systems, 2008. CISS 2008. 42nd Annual Conference on Information Sciences and Systems , pages 16–21, 2008.
- 4Cai and Wei (2021) Tony Cai and Hongji Wei. Distributed nonparametric function estimation: Optimal rate of convergence and cost of adaptation. ar Xiv preprint ar Xiv:2107.00179 , 2021.
- 5Cheng and Shang (2015) Guang Cheng and Zuofeng Shang. Joint asymptotics for semi-nonparametric regression models with partially linear structure. The Annals of Statistics , 43(3):1351–1390, 2015.
- 6Goldstein and Rinott (1996) Larry Goldstein and Yosef Rinott. Multivariate normal approximations by stein’s method and size bias couplings. Journal of Applied Probability , 33(1):1–17, 1996.
- 7Gopi et al. (2013) Sivakant Gopi, Praneeth Netrapalli, Prateek Jain, and Aditya Nori. One-bit compressed sensing: Provable support and vector recovery. In International Conference on Machine Learning , pages 154–162, 2013.
- 8Gu (2013) Chong Gu. Smoothing Spline ANOVA Models , volume 297. Springer Science & Business Media, 2013.
