Monotone Least Squares and Isotonic Quantiles

Alexandre M\"osching; Lutz Duembgen

arXiv:1901.02398·math.ST·September 9, 2020

Monotone Least Squares and Isotonic Quantiles

Alexandre M\"osching, Lutz Duembgen

PDF

TL;DR

This paper develops nonparametric methods for estimating isotonic distribution and quantile functions in bivariate data, establishing their convergence rates and relationships under stochastic order assumptions.

Contribution

It introduces two related monotone least squares estimators for distribution and quantile functions, analyzing their properties and convergence rates.

Findings

01

Establishes convergence rates for the estimators.

02

Shows the close relationship between distribution and quantile estimation methods.

03

Demonstrates the flexibility of the distribution-based approach over quantile-based methods.

Abstract

We consider bivariate observations $(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})$ such that, conditional on the $X_{i}$ , the $Y_{i}$ are independent random variables with distribution functions $F_{X_{i}}$ , where $(F_{x})_{x}$ is an unknown family of distribution functions. Under the sole assumption that $x \mapsto F_{x}$ is isotonic with respect to stochastic order, one can estimate $(F_{x})_{x}$ in two ways: (i) For any fixed $y$ one estimates the antitonic function $x \mapsto F_{x} (y)$ via nonparametric monotone least squares, replacing the responses $Y_{i}$ with the indicators $1_{[Y_{i} \leq y]}$ . (ii) For any fixed $β \in (0, 1)$ one estimates the isotonic quantile function $x \mapsto F_{x}^{- 1} (β)$ via a nonparametric version of regression quantiles. We show that these two approaches are closely related, with (i) being more flexible than (ii). Then, under mild regularity conditions, we establish rates of…

Figures1

Click any figure to enlarge with its caption.

Equations288

(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n}) \in X \times R

(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n}) \in X \times R

I P (Y_{i} \leq y ∣ X) = F_{X_{i}} (y),

I P (Y_{i} \leq y ∣ X) = F_{X_{i}} (y),

i = 1 \sum n (1_{[Y_{i} \leq y]} - h (X_{i}))^{2}

i = 1 \sum n (1_{[Y_{i} \leq y]} - h (X_{i}))^{2}

i = 1 \sum n ρ_{β} (Y_{i} - h (X_{i})),

i = 1 \sum n ρ_{β} (Y_{i} - h (X_{i})),

ρ_{β} (z) := (β - 1_{[z < 0]}) z .

ρ_{β} (z) := (β - 1_{[z < 0]}) z .

\sup_{x\in I,y\in J}\,\bigl{|}\widehat{F}_{x}(y)-F_{x}(y)\bigr{|}\quad\text{and}\quad\sup_{x\in I,\beta\in B}\,\bigl{|}\widehat{Q}_{x}(\beta)-Q_{x}(\beta)\bigr{|}

\sup_{x\in I,y\in J}\,\bigl{|}\widehat{F}_{x}(y)-F_{x}(y)\bigr{|}\quad\text{and}\quad\sup_{x\in I,\beta\in B}\,\bigl{|}\widehat{Q}_{x}(\beta)-Q_{x}(\beta)\bigr{|}

\sup_{y\in J}\,\bigl{|}\widehat{F}_{x_{o}}(y)-F_{x_{o}}(y)\bigr{|}\quad\text{and}\quad\sup_{\beta\in B}\,\bigl{|}\widehat{Q}_{x_{o}}(\beta)-Q_{x_{x}}(\beta)\bigr{|}

\sup_{y\in J}\,\bigl{|}\widehat{F}_{x_{o}}(y)-F_{x_{o}}(y)\bigr{|}\quad\text{and}\quad\sup_{\beta\in B}\,\bigl{|}\widehat{Q}_{x_{o}}(\beta)-Q_{x_{x}}(\beta)\bigr{|}

w_{j} := # {i : X_{i} = x_{j}} .

w_{j} := # {i : X_{i} = x_{j}} .

I P (Y_{i} \leq y) = F_{x_{j}} (y) whenever X_{i} = x_{j},

I P (Y_{i} \leq y) = F_{x_{j}} (y) whenever X_{i} = x_{j},

F_{j} (y) := w_{j}^{- 1} i : X_{i} = x_{j} \sum 1_{[Y_{i} \leq y]} .

F_{j} (y) := w_{j}^{- 1} i : X_{i} = x_{j} \sum 1_{[Y_{i} \leq y]} .

\sum_{i=1}^{n}(1_{[Y_{i}\leq y]}-h(X_{i}))^{2}\ =\ \sum_{j=1}^{m}w_{j}\bigl{(}\widehat{\mathbb{F}}_{j}(y)-h(x_{j})\bigr{)}^{2}+\sum_{j=1}^{m}w_{j}\widehat{\mathbb{F}}_{j}(y)\bigl{(}1-\widehat{\mathbb{F}}_{j}(y)\bigr{)},

\sum_{i=1}^{n}(1_{[Y_{i}\leq y]}-h(X_{i}))^{2}\ =\ \sum_{j=1}^{m}w_{j}\bigl{(}\widehat{\mathbb{F}}_{j}(y)-h(x_{j})\bigr{)}^{2}+\sum_{j=1}^{m}w_{j}\widehat{\mathbb{F}}_{j}(y)\bigl{(}1-\widehat{\mathbb{F}}_{j}(y)\bigr{)},

R_{↓}^{m} := {f \in R^{m} : f_{1} \geq f_{2} \geq \dots \geq f_{m}} .

R_{↓}^{m} := {f \in R^{m} : f_{1} \geq f_{2} \geq \dots \geq f_{m}} .

\widehat{\boldsymbol{F}}(y)=\bigl{(}\widehat{F}_{x_{j}}(y)\bigr{)}_{j=1}^{m}\ :=\ \operatorname*{arg\,min}_{\boldsymbol{f}\in\mathbb{R}^{m}_{\downarrow}}\,\sum_{j=1}^{m}w_{j}\bigl{(}\widehat{\mathbb{F}}_{j}(y)-f_{j}\bigr{)}^{2}.

\widehat{\boldsymbol{F}}(y)=\bigl{(}\widehat{F}_{x_{j}}(y)\bigr{)}_{j=1}^{m}\ :=\ \operatorname*{arg\,min}_{\boldsymbol{f}\in\mathbb{R}^{m}_{\downarrow}}\,\sum_{j=1}^{m}w_{j}\bigl{(}\widehat{\mathbb{F}}_{j}(y)-f_{j}\bigr{)}^{2}.

F_{x_{j}} (y) = r \leq j min s \geq j max F_{r s} (y) = s \geq j max r \leq j min F_{r s} (y),

F_{x_{j}} (y) = r \leq j min s \geq j max F_{r s} (y) = s \geq j max r \leq j min F_{r s} (y),

\displaystyle\widehat{\mathbb{F}}_{rs}(y)\

\displaystyle\widehat{\mathbb{F}}_{rs}(y)\

\displaystyle w_{rs}\

F_{x_{j}} (\cdot) is a distribution function .

F_{x_{j}} (\cdot) is a distribution function .

\widehat{F}_{x_{j}}^{-1}(\beta)\ :=\ \min\bigl{\{}y\in\mathbb{R}:\widehat{F}_{x_{j}}(y)\geq\beta\bigr{\}}\quad\text{and}\quad\widehat{F}_{x_{j}}^{-1}(\beta\,+)\ :=\ \inf\bigl{\{}y\in\mathbb{R}:\widehat{F}_{x_{j}}(y)>\beta\bigr{\}}.

\widehat{F}_{x_{j}}^{-1}(\beta)\ :=\ \min\bigl{\{}y\in\mathbb{R}:\widehat{F}_{x_{j}}(y)\geq\beta\bigr{\}}\quad\text{and}\quad\widehat{F}_{x_{j}}^{-1}(\beta\,+)\ :=\ \inf\bigl{\{}y\in\mathbb{R}:\widehat{F}_{x_{j}}(y)>\beta\bigr{\}}.

i = 1 \sum n ρ_{β} (Y_{i} - h (X_{i})) = j = 1 \sum m i : X_{i} = x_{j} \sum ρ_{β} (Y_{i} - h (x_{j})),

i = 1 \sum n ρ_{β} (Y_{i} - h (X_{i})) = j = 1 \sum m i : X_{i} = x_{j} \sum ρ_{β} (Y_{i} - h (x_{j})),

Q (β) := q \in R_{↑}^{m} arg min T_{β} (q),

Q (β) := q \in R_{↑}^{m} arg min T_{β} (q),

T_{β} (q) := j = 1 \sum m i : X_{i} = x_{j} \sum ρ_{β} (Y_{i} - q_{j}) .

T_{β} (q) := j = 1 \sum m i : X_{i} = x_{j} \sum ρ_{β} (Y_{i} - q_{j}) .

\displaystyle\widehat{\mathbb{F}}_{rs}^{-1}(\beta)\

\displaystyle\widehat{\mathbb{F}}_{rs}^{-1}(\beta)\

\displaystyle\widehat{\mathbb{F}}_{rs}^{-1}(\beta\,+)\

\displaystyle\ell_{j}\

\displaystyle\ell_{j}\

\displaystyle u_{j}\

F_{12} (y) = ⎩ ⎨ ⎧ 0 0.5 1 if y < Y_{2}, if Y_{2} \leq y < Y_{1}, if y \geq Y_{1} .

F_{12} (y) = ⎩ ⎨ ⎧ 0 0.5 1 if y < Y_{2}, if Y_{2} \leq y < Y_{1}, if y \geq Y_{1} .

ℓ = (Y_{2}, Y_{2})^{⊤} and u = (Y_{1}, Y_{1})^{⊤},

ℓ = (Y_{2}, Y_{2})^{⊤} and u = (Y_{1}, Y_{1})^{⊤},

F_{12}^{- 1} (0.5) = Y_{2}, F_{12}^{- 1} (0.5 +) = Y_{1} .

F_{12}^{- 1} (0.5) = Y_{2}, F_{12}^{- 1} (0.5 +) = Y_{1} .

\widehat{\mathcal{Q}}(0.5)\ =\ \bigl{\{}(q,q)^{\top}:q\in[Y_{2},Y_{1}]\bigr{\}},

\widehat{\mathcal{Q}}(0.5)\ =\ \bigl{\{}(q,q)^{\top}:q\in[Y_{2},Y_{1}]\bigr{\}},

ρ_{0.5} (Y_{1} - q_{1}) + ρ_{0.5} (Y_{2} - q_{2}) = 0.5 (Y_{1} - q_{1} + q_{2} - Y_{2}) \geq 0.5 (Y_{1} - Y_{2})

ρ_{0.5} (Y_{1} - q_{1}) + ρ_{0.5} (Y_{2} - q_{2}) = 0.5 (Y_{1} - q_{1} + q_{2} - Y_{2}) \geq 0.5 (Y_{1} - Y_{2})

\widehat{\mathcal{Q}}_{\mathrm{plug-in}}(\beta)\ :=\ \bigl{\{}\boldsymbol{q}\in\mathbb{R}^{m}_{\uparrow}:\widehat{F}_{x_{j}}^{-1}(\beta)\leq q_{j}\leq\widehat{F}_{x_{j}}^{-1}(\beta\,+)\ \text{for}\ 1\leq j\leq m\bigr{\}}.

\widehat{\mathcal{Q}}_{\mathrm{plug-in}}(\beta)\ :=\ \bigl{\{}\boldsymbol{q}\in\mathbb{R}^{m}_{\uparrow}:\widehat{F}_{x_{j}}^{-1}(\beta)\leq q_{j}\leq\widehat{F}_{x_{j}}^{-1}(\beta\,+)\ \text{for}\ 1\leq j\leq m\bigr{\}}.

ℓ_{j} = F_{x_{j}}^{- 1} (β) and u_{j} = F_{x_{j}}^{- 1} (β +) for 1 \leq j \leq m .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Monotone Least Squares and Isotonic Quantiles

Alexandre Mösching, Lutz Dümbgen

University of Bern

Abstract

We consider bivariate observations $(X_{1},Y_{1}),\ldots,(X_{n},Y_{n})$ such that, conditional on the $X_{i}$ , the $Y_{i}$ are independent random variables with distribution functions $F_{X_{i}}$ , where $(F_{x})_{x}$ is an unknown family of distribution functions. Under the sole assumption that $x\mapsto F_{x}$ is isotonic with respect to stochastic order, one can estimate $(F_{x})_{x}$ in two ways:

(i) For any fixed $y$ one estimates the antitonic function $x\mapsto F_{x}(y)$ via nonparametric monotone least squares, replacing the responses $Y_{i}$ with the indicators $1_{[Y_{i}\leq y]}$ .

(ii) For any fixed $\beta\in(0,1)$ one estimates the isotonic quantile function $x\mapsto F_{x}^{-1}(\beta)$ via a nonparametric version of regression quantiles.

We show that these two approaches are closely related, with (i) being more flexible than (ii). Then, under mild regularity conditions, we establish rates of convergence for the resulting estimators $\widehat{F}_{x}(y)$ and $\widehat{F}_{x}^{-1}(\beta)$ , uniformly over $(x,y)$ and $(x,\beta)$ in certain rectangles as well as uniformly in $y$ or $\beta$ for a fixed $x$ .

Keywords:

Regression quantiles, stochastic order, uniform consistency.

AMS 2000 subject classifications:

62G08, 62G20, 62G30.

1 Introduction

Suppose we observe $n\geq 1$ pairs

[TABLE]

with random or fixed covariate values $X_{1},\ldots,X_{n}$ in a set $\mathcal{X}\subset\mathbb{R}$ such that, conditional on $\boldsymbol{X}=(X_{i})_{i=1}^{n}$ , the response values $Y_{1},\ldots,Y_{n}$ are independent with

[TABLE]

for $1\leq i\leq n$ and $y\in\mathbb{R}$ . Here $(F_{x})_{x\in\mathcal{X}}$ is an unknown family of distribution functions on $\mathbb{R}$ . Note that some values $X_{i}$ could be identical, so the corresponding random variables $Y_{i}$ have the same conditional distribution, given $\boldsymbol{X}$ .

Our goal is to estimate the whole family $(F_{x})_{x\in\mathcal{X}}$ under the sole assumption that $x\mapsto F_{x}$ is isotonic (non-decreasing) with respect to stochastic order. This can be expressed in three equivalent ways:

(SO.1) For arbitrary fixed $y\in\mathbb{R}$ , $F_{x}(y)$ is antitonic (non-increasing) in $x\in\mathcal{X}$ .

(SO.2) For any fixed $\beta\in(0,1)$ , the minimal $\beta$ -quantile $F_{x}^{-1}(\beta):=\min\{y\in\mathbb{R}:F_{x}(y)\geq\beta\}$ is isotonic in $x\in\mathcal{X}$ .

(SO.3) For any fixed $\beta\in(0,1)$ , the maximal $\beta$ -quantile $F_{x}^{-1}(\beta\,+):=\inf\{y\in\mathbb{R}:F_{x}(y)>\beta\}$ is isotonic in $x\in\mathcal{X}$ .

In what follows, we denote with $Q_{x}(\beta)$ any $\beta$ -quantile of $F_{x}$ and assume that it is isotonic in $x$ .

Such a constraint appears natural in several settings. For instance, an employee’s income $Y$ tends to increase with his or her age $X$ . Other examples in which such a stochastic order is plausible are: The expenditures $Y$ of a household for certain goods in relation to its monthly income $X$ ; the body height or weight $Y$ of a child in relation to its age $X$ . Stochastic ordering constraints also have applications in forecasting. For example, $X_{1},\ldots,X_{n}$ and $Y_{1},\ldots,Y_{n}$ could be the predicted and actual cumulative precipitation amounts on $n$ different days, respectively, with the predictions being obtained from a numerical weather prediction model, see Henzi (2018).

With condition (SO.1) in mind, one could think about estimating the antitonic function $x\mapsto F_{x}(y)$ by means of monotone least squares regression, replacing the response values $Y_{i}$ with the indicator variables $1_{[Y_{i}\leq y]}$ . Precisely, we would set $\widehat{F}_{x}(y)=h(x)$ with an antitonic function $h:\mathcal{X}\to[0,1]$ such that

[TABLE]

is minimal. The solution $h$ is unique on the set $\mathcal{X}_{n}:=\{X_{1},\ldots,X_{n}\}$ , and on $\mathcal{X}\setminus\mathcal{X}_{n}$ one could extrapolate it in some reasonable way. In the special case of $\mathcal{X}$ being finite this approach has been proposed and analyzed by El Barmi and Mukerjee (2005).

Conditions (SO.2-3) suggest to imitate the regression quantiles of Koenker and Bassett (1978). That means, we estimate the conditional $\beta$ -quantiles $Q_{x}(\beta)$ by $\widehat{Q}_{x}(\beta)=h(x)$ with an isotonic function $h:\mathcal{X}\to\mathbb{R}$ minimizing the empirical risk

[TABLE]

where $\rho_{\beta}$ denotes the loss function

[TABLE]

This estimator has been considered, for instance, by Poiraud-Casanova and Thomas-Agnan (2000) who showed that it coincides with an estimator of Casady and Cryer (1976) which is given by a certain minimax formula involving sample $\beta$ -quantiles. The characterization of isotonic estimators in terms of minimax formulae has also been derived by Robertson and Wright (1980) in a rather general framework including arbitrary partial orders on $\mathcal{X}$ and general loss functions $R_{i}(\makebox[4.30554pt]{{$ \cdot $}})$ in place of $\rho_{\beta}(Y_{i}-\makebox[4.30554pt]{{$ \cdot $}})$ , see also Section 4.1.

The goals of the present paper are to clarify the connection between these two estimation paradigms and to provide new consistency results in a suitable asymptotic framework.

In Section 2, we give a detailed description of the estimator $(\widehat{F}_{x})_{x\in\mathcal{X}}$ based on monotone least squares and estimators $(\widehat{Q}_{x})_{x\in\mathcal{X}}$ based on monotone regression quantiles. Then we show that the estimators $\widehat{Q}_{x}$ are essentially quantiles of the estimators $\widehat{F}_{x}$ , but the latter allow for smoother estimated quantile curves.

In Section 3, we analyze the estimators in a suitable asymptotic framework with a triangular scheme of observations and $\mathcal{X}$ being a real interval. It turns out that under certain regularity conditions on the design points and the true distribution functions $F_{x}$ , one can prove rates of convergence for quantities such as

[TABLE]

with intervals $I\subset\mathcal{X}$ , $J\subset\mathbb{R}$ and $B\subset(0,1)$ . These results generalize and improve the findings of Casady and Cryer (1976), see also Mukerjee (1993) who analyzed a slightly different estimator. In addition we investigate

[TABLE]

for a fixed interior point $x_{o}$ of $\mathcal{X}$ . These results complement the analysis of a single quantile curve by Wright (1984).

Proofs and technical details are deferred to Section 4. We also provide some general results about isotonic regression which are of independent interest.

2 Estimation of the conditional distributions

Throughout this section, we view the observations $(X_{i},Y_{i})$ , $1\leq i\leq n$ , as fixed and focus mainly on computational aspects. Let $x_{1}<\cdots<x_{m}$ be the different elements of the set $\mathcal{X}_{n}$ of observed values $X_{i}$ , that means, $m\leq n$ . For $1\leq j\leq m$ , we set

[TABLE]

Then

[TABLE]

and the unconstrained maximum likelihood estimator of $F_{x_{j}}(y)$ is given by

[TABLE]

2.1 Estimation of $F_{x}$ via monotone least squares

The estimator $\widehat{\mathbb{F}}_{j}(y)$ in (1) is rather poor by itself, unless the corresponding subsample size $w_{j}$ is large. But in connection with our stochastic order constraint, it becomes a useful tool. Note first that, for any function $h:\mathcal{X}\to\mathbb{R}$ ,

[TABLE]

and the stochastic order assumption implies that the vector $\boldsymbol{F}(y)=(F_{x_{j}}(y))_{j=1}^{m}$ belongs to the cone

[TABLE]

Hence one can estimate $\boldsymbol{F}(y)$ by the unique least squares estimator

[TABLE]

It is well-known that $\widehat{\boldsymbol{F}}(y)$ may also be represented by the following minimax and maximin formulae, see Robertson et al. (1988): For $1\leq j\leq m$ ,

[TABLE]

where

[TABLE]

and $r,s$ stand for indices in $\{1,2,\ldots,m\}$ such that $r\leq s$ . These formulae are useful for theoretical considerations. In particular, since the pointwise maximum or minimum of finitely many distribution functions is a distribution function, too, we may conclude that for $1\leq j\leq m$ ,

[TABLE]

The computation of $\widehat{\boldsymbol{F}}(y)$ is easily accomplished via the pool-adjacent-violators algorithm (PAVA), see Robertson et al. (1988). Note also that it suffices to compute $\widehat{\boldsymbol{F}}(y)$ for at most $n-1$ different values of $y$ . Precisely, if $y_{1}<y_{2}<\cdots<y_{\ell}$ are the elements of $\{Y_{1},Y_{2},\ldots,Y_{n}\}$ , then $\widehat{\boldsymbol{F}}(y)=\boldsymbol{0}$ for $y<y_{1}$ , $\widehat{\boldsymbol{F}}(y)=\boldsymbol{1}$ for $y\geq y_{\ell}$ , and $\widehat{\boldsymbol{F}}(y)=\widehat{\boldsymbol{F}}(y_{k})$ for $1\leq k<\ell$ and $y\in[y_{k},y_{k+1})$ . Consequently, since the PAVA is known to have linear complexity, the computation of all estimators $\widehat{F}_{x_{j}}(\cdot)$ , $1\leq j\leq m$ , requires $O(n\log n+m\ell)=O(n^{2})$ steps.

Finally, we extrapolate $\widehat{\boldsymbol{F}}(y)$ to an antitonic function $x\mapsto\widehat{F}_{x}(y)$ on $\mathcal{X}$ . We set $\widehat{F}_{x}(y):=\widehat{F}_{x_{1}}(y)$ for $x\leq x_{1}$ and $\widehat{F}_{x}(y):=\widehat{F}_{x_{m}}(y)$ for $x\geq x_{m}$ . For $x_{j-1}\leq x\leq x_{j}$ , $1<j\leq m$ , one could define $\widehat{F}_{x}(y)$ by linear interpolation, but other antitonic interpolations are possible without affecting our asymptotic results.

2.2 Plug-in estimation of $Q_{x}$

Once we have estimated $(F_{x})_{x\in\mathcal{X}}$ by $(\widehat{F}_{x})_{x\in\mathcal{X}}$ as in Section 2.1, we can easily determine corresponding quantile functions. For any fixed $\beta\in(0,1)$ and $x_{j}$ , $1\leq j\leq m$ , we could determine the minimal and maximal $\beta$ -quantiles,

[TABLE]

Both vectors $(\widehat{F}_{x_{j}}^{-1}(\beta))_{j=1}^{m}$ and $(\widehat{F}_{x_{j}}^{-1}(\beta\,+))_{j=1}^{m}$ are isotonic, and any choice of an isotonic function $\mathcal{X}\ni x\mapsto\widehat{Q}_{x}(\beta)$ such that $\widehat{F}_{x_{j}}^{-1}(\beta)\leq\widehat{Q}_{x_{j}}(\beta)\leq\widehat{F}_{x_{j}}^{-1}(\beta\,+)$ , $1\leq j\leq m$ , is a plausible estimator of a $\beta$ -quantile curve.

2.3 Estimation of $Q_{x}$ via monotone regression quantiles

Similarly as in Section 2.1, we focus on the vector $\boldsymbol{Q}(\beta)=(Q_{x_{j}}(\beta))_{j=1}^{m}$ . Writing

[TABLE]

one can estimate $\boldsymbol{Q}(\beta)$ by some vector in the set

[TABLE]

where $\mathbb{R}^{m}_{\uparrow}:=-\mathbb{R}^{m}_{\downarrow}=\{\boldsymbol{q}\in\mathbb{R}^{m}:q_{1}\leq q_{2}\leq\cdots\leq q_{m}\}$ and

[TABLE]

Note that the function $T_{\beta}(\makebox[4.30554pt]{{$ \cdot $}})$ is convex but not strictly convex on $\mathbb{R}^{m}$ . Hence it need not have a unique minimizer. The next result provides more precise information in terms of the minimal and maximal sample $\beta$ -quantiles

[TABLE]

Lemma 2.1.

The set $\widehat{\mathcal{Q}}(\beta)$ is a compact and convex subset of $\mathbb{R}^{m}_{\uparrow}$ .

Two particular elements of $\widehat{\mathcal{Q}}(\beta)$ are the vectors $\boldsymbol{\ell}=(\ell_{j})_{j=1}^{m}$ and $\boldsymbol{u}=(u_{j})_{j=1}^{m}$ with components

[TABLE]

Any vector $\boldsymbol{q}\in\widehat{\mathcal{Q}}(\beta)$ satisfies $\boldsymbol{\ell}\leq\boldsymbol{q}\leq\boldsymbol{u}$ componentwise.

On the other hand, suppose that $\boldsymbol{q}\in\mathbb{R}^{m}_{\uparrow}$ satisfies $\boldsymbol{\ell}\leq\boldsymbol{q}\leq\boldsymbol{u}$ and that $\{j<m:q_{j}<q_{j+1}\}$ is a subset of $\{j<m:\ell_{j}<\ell_{j+1}\ \text{or}\ u_{j}<u_{j+1}\}$ . Then $\boldsymbol{q}\in\widehat{\mathcal{Q}}(\beta)$ .

Finally, for any $j\in\{1,\ldots,m\}$ , the set $\{x_{j}\}\times(\ell_{j},u_{j})$ contains no data point $(X_{i},Y_{i})$ .

Remark 2.2.

At first glance, one might suspect that any isotonic vector $\boldsymbol{q}\in\mathbb{R}^{m}_{\uparrow}$ satisfying $\boldsymbol{\ell}\leq\boldsymbol{q}\leq\boldsymbol{u}$ minimizes $T_{\beta}$ . But this conjecture is wrong. As a counterexample, consider the case of $n=2$ observations with $X_{1}<X_{2}$ but $Y_{1}>Y_{2}$ . Here $m=2$ , and $\widehat{\mathbb{F}}_{11}(y)=1_{[y\geq Y_{1}]}$ , $\widehat{\mathbb{F}}_{22}(y)=1_{[y\geq Y_{2}]}$ and

[TABLE]

Hence

[TABLE]

because $\widehat{\mathbb{F}}_{11}^{-1}(0.5)=\widehat{\mathbb{F}}_{11}^{-1}(0.5\,+)=Y_{1}$ , $\widehat{\mathbb{F}}_{22}^{-1}(0.5)=\widehat{\mathbb{F}}_{22}^{-1}(0.5\,+)=Y_{2}$ and

[TABLE]

But

[TABLE]

because for $\boldsymbol{q}\in[Y_{2},Y_{1}]^{2}$ with $q_{1}\leq q_{2}$ ,

[TABLE]

with equality if, and only if, $q_{1}=q_{2}$ .

2.4 Connection between the two estimation paradigms

Restricting the plug-in quantile estimators of Section 2.2 to the set $\mathcal{X}_{n}$ of observed $X$ -values leads to the set

[TABLE]

This set is closely related to the set $\widehat{\mathcal{Q}}(\beta)$ :

Lemma 2.3.

The vectors $\boldsymbol{\ell}$ and $\boldsymbol{u}$ in Lemma 2.1 are given by

[TABLE]

In particular, $\widehat{\mathcal{Q}}(\beta)\subset\widehat{\mathcal{Q}}_{\mathrm{plug-in}}(\beta)$ .

Example 2.4.

The simple example in Remark 2.2 shows that $\widehat{\mathcal{Q}}(\beta)\neq\widehat{\mathcal{Q}}_{\rm plug-in}(\beta)$ in general. Let us illustrate this point with a more interesting numerical example. Figure 1 shows a simulated sample of size $n=100$ . In addition, it shows the minimal and maximal median curves $x\mapsto\widehat{F}_{x}^{-1}(0.5),\widehat{F}_{x}^{-1}(0.5\,+)$ obtained by linear interpolation of the points $\ell_{j}=\widehat{F}_{x_{j}}^{-1}(0.5)$ and $u_{j}=\widehat{F}_{x_{j}}^{-1}(0.5\,+)$ , respectively, as well as a piecewise linear median curve $x\mapsto\widehat{Q}_{x}(0.5)$ minimizing $\int q^{\prime}(x)^{2}\,dx$ among all isotonic functions $q:\mathbb{R}\to\mathbb{R}$ such that $\ell_{j}\leq q(x_{j})\leq u_{j}$ , $1\leq j\leq m$ . Although $\widehat{Q}_{x}(0.5)$ is a natural candidate and smoother in $x$ than $\widehat{F}_{x}^{-1}(0.5)$ or $\widehat{F}_{x}^{-1}(0.5\,+)$ , the corresponding values of $T_{0,5}(\makebox[4.30554pt]{{$ \cdot $}})$ are (rounded to three digits)

[TABLE]

The true medians $F_{x}^{-1}(0.5)=F_{x}^{-1}(0.5\,+)$ are depicted as well.

3 Asymptotic considerations

We provide some asymptotic properties of the estimators just introduced in case of a real interval $\mathcal{X}$ and a triangular scheme of observations: For each sample size $n\geq 2$ , consider observations $(X_{n1},Y_{n1}),\ldots,(X_{nn},Y_{nn})$ with $X_{n1},\ldots,X_{nn}\in\mathcal{X}$ such that conditional on $\boldsymbol{X}_{n}:=(X_{ni})_{i=1}^{n}$ , the random variables $Y_{n1},\ldots,Y_{nn}$ are independent with

[TABLE]

for $1\leq i\leq n$ and $y\in\mathbb{R}$ . The resulting constrained estimators of $F_{x}(y)$ and $Q_{x}(\beta)$ are denoted by $\widehat{F}_{nx}(y)$ and $\widehat{Q}_{nx}(\beta)$ , respectively. In what follows, we derive asymptotic properties of these estimators under moderate assumptions, where asymptotic statements refer to $n\to\infty$ .

El Barmi and Mukerjee (2005) derived asymptotic properties in case of a fixed finite set $\mathcal{X}$ , which is easier to handle than the present setting.

3.1 Uniform consistency in both arguments

First of all, we assume that the distribution functions $F_{x}$ are Hölder-continuous in $x$ , at least on some subinterval of $\mathcal{X}$ :

(A.1)

For given intervals $I\subset\mathcal{X}$ and $J\subset\mathbb{R}$ , there exist constants $\alpha\in(0,1]$ and $C_{1}>0$ such that

[TABLE]

Secondly, we assume that the design points are ‘asymptotically dense’ within this interval $I$ . To state this precisely, we need some notation. We write

[TABLE]

and $\lambda(\makebox[4.30554pt]{{$ \cdot $}})$ stands for Lebesgue measure. Moreover, the absolute frequency of the design points $X_{ni}$ is denoted by $w_{n}(\makebox[4.30554pt]{{$ \cdot $}})$ , that means,

[TABLE]

(A.2)

For given constants $C_{2},C_{3}>0$ , let $A_{n}$ be the event that for arbitrary intervals $I_{n}\subset I$ ,

[TABLE]

Then,

[TABLE]

Remark 3.1 (Fixed design points).

Suppose that $I=\mathcal{X}=[a,b]$ with real numbers $a<b$ , and let $X_{ni}=a+(i/n)(b-a)$ for $1\leq i\leq n$ . Then Assumption (A.2) is satisfied for any fixed $C_{2}<1$ and $C_{3}>0$ .

Remark 3.2 (Random design points).

Suppose that $X_{n1},X_{n2},\ldots,X_{nn}$ are independent random variables with density $g$ on $\mathcal{X}$ such that $\inf_{x\in I}g(x)>0$ on $I$ . With standard results from empirical processes on the real line, including exponential inequalities for beta distributions, we can show that for any choice of $\alpha\in(0,1]$ , $0<C_{2}<\inf_{x\in I}g(x)$ and $C_{3}>0$ ,

[TABLE]

with asymptotic probability one as $n\to\infty$ . Hence Assumption (A.2) is satisfied.

Under the two assumptions above, the estimator $\widehat{F}_{nx}$ satisfies a uniform consistency property.

Theorem 3.3.

Suppose that Assumptions (A.1–2) are satisfied. Then there exists a $C=C(C_{1},C_{2},C_{3})>0$ such that

[TABLE]

where $I_{n}:=\{x\in\mathbb{R}:[x\pm\delta_{n}]\subset I\}$ .

Concerning estimated quantiles, we combine Assumptions (A.1–2) with a growth condition on the conditional distribution functions $F_{x}$ :

(A.3)

For some numbers $0\leq\beta_{1}<\beta_{2}\leq 1$ and $\kappa>0$ ,

[TABLE]

for arbitrary $x\in I$ and $y_{1},y_{2}\in\mathbb{R}$ such that $y_{1}<y_{2}$ and $F_{x}(y_{1}),F_{x}(y_{2}-)\in(\beta_{1},\beta_{2})$ .

For instance, if each $F_{x}$ , $x\in I$ , has a density $f_{x}$ such that

[TABLE]

then (A.3) is satisfied with the latter parameter $\kappa$ .

Theorem 3.4.

Suppose that Assumptions (A.1–3) are satisfied with $J=\mathbb{R}$ in (A.1). Then, for any plug-in estimator $(\widehat{Q}_{nx})_{x\in\mathcal{X}}$ of $(Q_{x})_{x\in\mathcal{X}}$ ,

[TABLE]

where $I_{n}\subset I$ and $C=C(C_{1},C_{2},C_{3})$ are defined as in Theorem 3.3, and $B_{n}$ denotes the interval $(\beta_{1}+C\rho_{n}^{\alpha/(2\alpha+1)},\beta_{2}-C\rho_{n}^{\alpha/(2\alpha+1)})$ .

3.2 Uniform consistency at a single point $x_{o}$

In addition to the previous uniform convergence results, one may verify uniform consistency of $\widehat{F}_{nx_{o}}$ and $\widehat{Q}_{nx_{o}}$ for a fixed interior point $x_{o}$ of $\mathcal{X}$ . These results require similar but weaker assumptions.

(A’.1 $\boldsymbol{{}_{x_{o}}}$ )

For a neighbourhood $U\subset\mathcal{X}$ of $x_{o}$ and an interval $J\subset\mathbb{R}$ , there exist constants $\alpha\in(0,1]$ and $C_{1}>0$ such that

[TABLE]

(A’.2 $\boldsymbol{{}_{x_{o}}}$ )

For given constants $C_{2},C_{3}>0$ , let $A_{n}$ be the event that

[TABLE]

Then,

[TABLE]

Under these two assumptions, the following consistency property holds.

Theorem 3.5.

Suppose that Assumptions (A’.1–2 ${}_{x_{o}}$ ) are satisfied. Then

[TABLE]

(A’.3 $\boldsymbol{{}_{x_{o}}}$ )

For some numbers $0\leq\beta_{1}<\beta_{2}\leq 1$ and $\kappa>0$ ,

[TABLE]

for arbitrary $y_{1},y_{2}\in\mathbb{R}$ such that $y_{1}<y_{2}$ and $F_{x_{o}}(y_{1}),F_{x_{o}}(y_{2}-)\in(\beta_{1},\beta_{2})$ .

Theorem 3.6.

Suppose that Assumptions (A.1–3 ${}_{x_{o}}$ ) are satisfied with $J=\mathbb{R}$ in (A.1 ${}_{x_{o}}$ ). Then, for any plug-in estimator $(\widehat{Q}_{nx})_{x\in\mathcal{X}}$ of $(Q_{x})_{x\in\mathcal{X}}$ ,

[TABLE]

where $B_{n}:=(\beta_{1}+\Delta_{n},\beta_{2}-\Delta_{n})$ and $\Delta_{n}=\mathcal{O}(n^{-\alpha/(2\alpha+1)})$ .

4 Proofs and technical details

4.1 Monotone regression

In this section we review isotonic regression on a totally ordered set in a rather general setting, summarizing and extending results of numerous authors. Our main goal is a thorough understanding of isotonic regression in situations with potentially non-unique solutions. For extensions to partially ordered sets we refer to Mühlemann et al. (2019).

The starting point are $m\geq 2$ loss functions $R_{1},\ldots,R_{m}:\mathbb{R}\to\mathbb{R}$ with the following property: For arbitrary indices $1\leq a\leq b\leq m$ , the function

[TABLE]

is minimal on a compact interval $[L_{ab},U_{ab}]\subset\mathbb{R}$ , strictly antitonic on $(-\infty,L_{ab}]$ and strictly isotonic on $[U_{ab},\infty)$ .

This property is satisfied if all functions $R_{j}$ are convex with $R_{j}(x)\to\infty$ as $|x|\to\infty$ . It implies a refined version of the so-called Cauchy-mean-value property.

Proposition 4.1.

Let $\{a,\ldots,b\}\subset\{1,\ldots,m\}$ be partitioned into $k\geq 2$ index intervals $\{a_{1},\ldots,b_{1}\},\ldots,\{a_{k},\ldots,b_{k}\}$ . Then

[TABLE]

Proof.

The smallest minimizer $L_{ab}$ of $R_{ab}$ is the largest real number $r$ such that $R_{ab}$ is strictly antitonic on $(-\infty,r]$ and the smallest real number $s$ such that $R_{ab}$ is isotonic on $[s,\infty)$ . Since $R_{ab}=\sum_{i=1}^{k}R_{a_{i}b_{i}}$ , this function is strictly antitonic on $\bigcap_{1\leq i\leq k}(-\infty,L_{a_{i}b_{i}}]=\bigl{(}-\infty,\min_{1\leq i\leq k}L_{a_{i}b_{i}}\bigr{]}$ and isotonic on $\bigcap_{1\leq i\leq k}[L_{a_{i}b_{i}},\infty)=\bigl{[}\max_{1\leq i\leq k}L_{a_{i}b_{i}},\infty\bigr{)}$ . This yields the desired inequalities for $L_{ab}$ . The largest minimizer $U_{ab}$ can be handled analogously. ∎

Now we consider the function $T:\mathbb{R}^{m}\to\mathbb{R}$ ,

[TABLE]

and the set

[TABLE]

The elements of $\mathcal{Q}$ can be characterized completely in terms of the minimizers of the functions $R_{ab}$ . Throughout the sequel, we set $x_{0}:=-\infty$ and $x_{m+1}:=\infty$ for a vector $\boldsymbol{x}\in\mathbb{R}^{m}_{\uparrow}$ . Moreover, the componentwise minimum and maximum of vectors $\boldsymbol{x},\boldsymbol{y}\in\mathbb{R}^{m}$ are denoted by $\min(\boldsymbol{x},\boldsymbol{y})$ and $\max(\boldsymbol{x},\boldsymbol{y})$ , respectively.

Proposition 4.2.

For a vector $\boldsymbol{x}\in\mathbb{R}^{m}_{\uparrow}$ , the following two properties are equivalent:

(i) $\boldsymbol{x}\in\mathcal{Q}$ .

(ii) For arbitrary indices $1\leq a\leq b\leq m$ ,

[TABLE]

This characterization is a generalization of Theorem 8.1 of Dümbgen and Kovac (2009).

Proof of Proposition 4.2.

We first show that property (i) is equivalent to a seemingly weaker version of (ii):

(ii’) For arbitrary indices $1\leq a\leq b\leq m$ ,

[TABLE]

Suppose that property (ii’) is violated. Specifically, for some indices $1\leq a\leq b\leq m$ , let $x_{a-1}<x_{a}=x_{b}$ but $x_{a}>U_{ab}$ . Since $R_{ab}$ is strictly isotonic on $[U_{ab},\infty)$ ,

[TABLE]

defines a vector $\tilde{\boldsymbol{x}}\in\mathbb{R}^{m}_{\uparrow}$ such that $T(\tilde{\boldsymbol{x}})<T(\boldsymbol{x})$ . Analogously, if $x_{a}=x_{b}<x_{b+1}$ but $x_{b}<L_{ab}$ , one can find a vector $\tilde{\boldsymbol{x}}\in\mathbb{R}^{m}_{\uparrow}$ such that $T(\tilde{\boldsymbol{x}})<T(\boldsymbol{x})$ . This shows that property (i) implies property (ii’).

Suppose that property (ii’) is satisfied, and let $\boldsymbol{y}$ be an arbitrary vector in $\mathbb{R}^{m}_{\uparrow}$ . If $y_{j}>x_{j}$ for some index $j$ , let $a$ be the smallest such index, and let $c$ be the largest index with $x_{c}=x_{a}$ . Thus $x_{a}=x_{c}<x_{c+1}$ and $y_{a-1}\leq x_{a}<y_{a}\leq y_{c}$ . Now we repeat the following step until $y_{c}=x_{c}$ : We choose the smallest index $b$ such that $y_{b}=y_{c}$ . Property (ii’) implies that $x_{c}\geq L_{bc}$ , so $R_{bc}$ is isotonic on $[x_{c},\infty)$ . Consequently, if we replace $y_{b},\ldots,y_{c}$ with the smaller number $\max(x_{c},y_{b-1})$ , the value $T(\boldsymbol{y})$ does not increase. These considerations show that replacing $y_{a},\ldots,y_{c}$ with $x_{a}=x_{c}$ yields a new vector $\boldsymbol{y}\in\mathbb{R}^{m}_{\uparrow}$ with the same or a smaller value of $T(\boldsymbol{y})$ . Repeating this construction finitely often shows that replacing $\boldsymbol{y}$ with $\min(\boldsymbol{x},\boldsymbol{y})$ does not increase $T(\boldsymbol{y})$ . Analogously one can show that replacing $\boldsymbol{y}$ with $\max(\boldsymbol{x},\boldsymbol{y})$ does not increase $T(\boldsymbol{y})$ . Combining both steps shows that the original vector $\boldsymbol{y}$ satisfies the inequality $T(\boldsymbol{y})\geq T(\boldsymbol{x})$ . Hence $\boldsymbol{x}$ belongs to $\mathcal{Q}$ .

It remains to show equivalence of properties (ii) and (ii’). The latter is obviously a consequence of the former one. Hence it suffices to show that a violation of property (ii) implies a violation of (ii’). Consider indices $1\leq a\leq b\leq m$ such that $x_{a-1}<x_{a}$ but $x_{a}>U_{ab}$ . In case of $x_{b}=x_{a}$ , this is a violation of property (ii). In case of $x_{a}<x_{b}$ we partition $\{a,\ldots,b\}$ into maximal index intervals $\{a_{1},\ldots,b_{1}\},\ldots,\{a_{k},\ldots,b_{k}\}$ on which $j\mapsto x_{j}$ is constant. Then $x_{a}=\min_{1\leq i\leq k}x_{a_{i}}$ , whereas Proposition 4.1 yields the inequality $U_{ab}\geq\min_{1\leq i\leq k}U_{a_{i}b_{i}}$ . Hence for some index $i$ , $x_{a_{i}-1}<x_{a_{i}}=x_{b_{i}}$ but $x_{a_{i}}>U_{a_{i}b_{i}}$ , a violation of (ii). The situation that $x_{b}<x_{b-1}$ but $x_{b}<L_{ab}$ can be handled analogously. ∎

Proposition 4.2 implies already an interesing property of the set $\mathcal{Q}$ .

Corollary 4.3.

If $\boldsymbol{x}^{(1)},\boldsymbol{x}^{(2)}\in\mathcal{Q}$ , then $\min(\boldsymbol{x}^{(1)},\boldsymbol{x}^{(2)})$ and $\max(\boldsymbol{x}^{(1)},\boldsymbol{x}^{(2)})$ belong to $\mathcal{Q}$ as well.

Proof.

For symmetry reasons it suffices to verify that $\boldsymbol{x}:=\min(\boldsymbol{x}^{(1)},\boldsymbol{x}^{(2)})\in\mathcal{Q}$ , and this is equivalent to $\boldsymbol{x}$ satisfying property (iii) in Proposition 4.2. Let $1\leq a\leq b\leq m$ , and suppose that $x_{a-1}<x_{a}$ . Then for some $k\in\{1,2\}$ ,

[TABLE]

so property (iii) of $\boldsymbol{x}^{(k)}$ implies that $x_{a}\leq x_{a}^{(k)}\leq U_{ab}$ . In case of $x_{b}<x_{b+1}$ , we choose $k\in\{1,2\}$ such that

[TABLE]

and then property (iii) of $\boldsymbol{x}^{(k)}$ implies that $x_{b}=x_{b}^{(k)}\geq L_{ab}$ . ∎

Now we provide the main result involving min-max and max-min formulae for the set $\mathcal{Q}$ .

Theorem 4.4.

For any index $1\leq j\leq m$ ,

[TABLE]

This defines vectors $\boldsymbol{\ell}=(\ell_{j}^{(1)})_{j=1}^{m}$ and $\boldsymbol{u}=(u_{j}^{(1)})_{j=1}^{m}$ in $\mathcal{Q}$ , and any vector $\boldsymbol{x}\in\mathcal{Q}$ satisfies $\boldsymbol{\ell}\leq\boldsymbol{x}\leq\boldsymbol{u}$ componentwise.

Proof of Theorem 4.4.

For symmetry reasons, if suffices to verify the claims about $\boldsymbol{\ell}$ . Precisely, with $\boldsymbol{\ell}^{(k)}:=(\ell_{k}^{(k)})_{j=1}^{m}$ , we show subsequently that

[TABLE]

Inequality (3) follows from

[TABLE]

for $1\leq j\leq m$ .

As to (4), for $\boldsymbol{x}\in\mathcal{Q}$ and $1\leq j\leq m$ let $\tilde{b}$ be the largest index such that $x_{\tilde{b}}=x_{j}$ . Then $x_{\tilde{b}}<x_{\tilde{b}+1}$ , so property (ii) of $\boldsymbol{x}$ in Proposition 4.2 implies that

[TABLE]

It remains to verify (5). For indices $1\leq j<k\leq m$ ,

[TABLE]

whence $\boldsymbol{\ell}^{(1)}\in\mathbb{R}^{m}_{\uparrow}$ . To show that $\boldsymbol{\ell}^{(1)}\in\mathcal{Q}$ , it suffices to show that it has property (iii) in Proposition 4.2, and this is an immediate consequence of the following two claims: For $1\leq j\leq m$ ,

[TABLE]

As to (6), suppose that the conclusion is wrong, i.e. $\ell_{j}^{(1)}>\min_{b\geq j}L_{jb}$ . Then $j>1$ , and for some index $\tilde{a}\leq j-1$ ,

[TABLE]

where we used Proposition 4.1. But then

[TABLE]

i.e. the assumption of (6) is wrong as well.

Concerning (7), suppose that that the conclusion is wrong, i.e. $\ell_{j}^{(1)}<L_{\tilde{a}j}$ for some $\tilde{a}\leq j$ . Then $j<m$ , and

[TABLE]

Consequently,

[TABLE]

This is true for any index $\tilde{a}\leq j$ with $L_{\tilde{a}j}>\ell_{j}^{(1)}$ . If $a\leq j$ is such that $L_{aj}\leq\ell_{j}^{(1)}$ , then

[TABLE]

Thus $\min_{b\geq j+1}L_{aj}\leq\ell_{j}^{(1)}$ for any $a\leq j+1$ . Consequently, $\ell_{j+1}^{(1)}\leq\ell_{j}^{(1)}$ , i.e. the assumption of (7) is wrong as well. ∎

We end this subsection with two additional conclusions for the special case of convex functions $R_{j}$ .

Theorem 4.5.

Suppose in addition that all loss functions $R_{j}$ are convex. Then the set $\mathcal{Q}$ is compact and convex. If $\boldsymbol{x}\in\mathbb{R}^{m}_{\uparrow}$ is such that $\boldsymbol{\ell}\leq\boldsymbol{x}\leq\boldsymbol{u}$ and $\{j<m:x_{j}<x_{j+1}\}\subset\{j<m:\ell_{j}<\ell_{j+1}\ \text{or}\ u_{j}<u_{j+1}\}$ , then $\boldsymbol{x}\in\mathcal{Q}$ . Moreover, each function $R_{j}$ is linear on the interval $[\ell_{j},u_{j}]$ .

Proof.

The general assumptions imply that each function $R_{j}=R_{jj}$ has a compact set of minimizers. Together with convexity, this implies that $R_{j}$ is continuous with $R_{j}(x)\to\infty$ as $|x|\to\infty$ . But then, $T:\mathbb{R}^{m}\to\mathbb{R}$ is a continuous and convex function such that $T(\boldsymbol{x})\to\infty$ as $\|\boldsymbol{x}\|\to\infty$ . Moreover, $\mathbb{R}^{m}_{\uparrow}$ is a closed convex cone in $\mathbb{R}^{m}$ . This implies that $\mathcal{Q}$ is a compact and convex set.

To verify the remaining statements, consider the vectors $\boldsymbol{x}(\lambda):=(1-\lambda)\boldsymbol{\ell}+\lambda\boldsymbol{u}$ , $\lambda\in[0,1]$ . Since $\mathcal{Q}$ is a convex set, all these vectors belong to $\mathcal{Q}$ . But for $0<\lambda<1$ ,

[TABLE]

Exploiting property (ii) of $\boldsymbol{x}(\lambda)$ in Proposition 4.2 for all $\lambda\in(0,1)$ , we may conclude that for arbitrary indices $1\leq a\leq b\leq m$ ,

[TABLE]

In particular, any vector $\boldsymbol{x}\in\mathbb{R}^{m}_{\uparrow}$ such that $\boldsymbol{\ell}\leq\boldsymbol{x}\leq\boldsymbol{u}$ and $\{j<m:x_{j}<x_{j+1}\}$ is a subset of $\{j<m:\ell_{j}<\ell_{j+1}\ \text{or}\ u_{j}<u_{j+1}\}$ satisfies property (iii) in Proposition 4.2. Hence $\boldsymbol{x}\in\mathcal{Q}$ .

Finally, since

[TABLE]

is constant in $\lambda\in[0,1]$ , each summand $R_{j}\bigl{(}(1-\lambda)\ell_{j}+\lambda u_{j}\bigr{)}$ has to be linear in $\lambda\in[0,1]$ , which is equivalent to $R_{j}$ being linear on $[\ell_{j},u_{j}]$ . ∎

4.2 Proofs of Lemma 2.1 and 2.3

Proof of Lemma 2.1.

For $1\leq j\leq m$ , set

[TABLE]

This is a convex function of $q\in\mathbb{R}$ with $R_{j}(q)\to\infty$ as $|q|\to\infty$ . To apply the results of the previous subsection, we need to determine the sets $[L_{ab},U_{ab}]$ for $1\leq a\leq b\leq m$ . Note that $R_{j}^{\prime}(q\,+)=\sum_{i:X_{i}=x_{j}}(1_{[Y_{i}\leq q]}-\beta)$ , whence

[TABLE]

Consequently,

[TABLE]

Now all but the last statement of Lemma 2.1 follow from Theorems 4.4 and 4.5. As to the last statement, note that each $R_{j}$ is a convex and piecewise linear function with strict changes of slope at each $Y_{i}$ such that $X_{i}=x_{j}$ . Consequently, since $R_{j}$ is linear on $[\ell_{j},u_{j}]$ , there is no data point $(X_{i},Y_{i})$ such that $X_{i}=x_{j}$ and $Y_{i}\in(\ell_{j},u_{j})$ . ∎

Proof of Lemma 2.3.

For arbitrary $y\in\mathbb{R}$ ,

[TABLE]

But the min-max formula (2) for $\widehat{F}_{x_{j}}(y)$ implies that the inequality on the right hand side is equivalent to the following statements:

[TABLE]

Hence $\widehat{F}_{x_{j}}^{-1}(\beta)=\ell_{j}$ . Analogously, for any $y\in\mathbb{R}$ ,

[TABLE]

But (2) remains valid if we replace ‘ $(y)$ ’ with ‘ $(y\,-)$ ’, so the inequality on the right hand side is equivalent to the following statements:

[TABLE]

Hence $\widehat{F}_{x_{j}}^{-1}(\beta\,+)=u_{j}$ . ∎

4.3 Asymptotics

In what follows, we always work with the conditional distribution of $(Y_{ni})_{i=1}^{n}$ , given $\boldsymbol{X}_{n}$ . Moreover, we tacitly assume that $\boldsymbol{X}_{n}$ is a “good” vector in the sense that the event $A_{n}$ in Assumption (A.2) or (A’.2 ${}_{x_{o}}$ ) occurs.

To lighten the notation, we do not introduce an extra subscript $n$ for the weights $w_{rs}$ or the empirical distribution functions $\widehat{\mathbb{F}}_{rs}$ . Furthermore, we define

[TABLE]

The norm $\|\cdot\|_{\infty}$ denotes the usual supremum norm of functions on the real line.

The proofs make use of the following exponential inequality which follows from Bretagnolle (1980) and Hu (1985).

Theorem 4.6.

Let $Y_{1},Y_{2},Y_{3},\ldots$ be independent random variables with respective distribution functions $F_{1},F_{2},F_{3},\ldots$ . For $k\in\mathbb{N}$ , let

[TABLE]

Then there exists a universal constant $C_{4}\leq 2^{5/2}e$ such that for all $\eta\geq 0$ ,

[TABLE]

Corollary 4.7.

Let

[TABLE]

Then for any constant $D>1$ ,

[TABLE]

Proof of Corollary 4.7.

Note that $M_{n}$ is the maximum of the $\binom{m+1}{2}$ quantities

[TABLE]

and we may apply Theorem 4.6 to each of them. Consequently,

[TABLE]

for arbitrary $\eta_{n}\geq 0$ . But the right hand side converges to zero as $n\to\infty$ if $\eta_{n}=(D\log n)^{1/2}$ for some $D>1$ . ∎

Proof of Theorem 3.3.

Recall that $\rho_{n}=\log(n)/n$ , $\delta_{n}=C_{3}\rho_{n}^{1/(2\alpha+1)}$ and $I_{n}=\{x\in I:[x\pm\delta_{n}]\subset I\}$ . Recall also that we treat $\boldsymbol{X}_{n}$ as fixed and assume that the event $A_{n}$ in Assumption (A.2) occurs. Let $n$ be sufficiently large so that $I_{n}\neq\emptyset$ . For $x\in I_{n}$ the indices

[TABLE]

are well-defined, because $[x-\delta_{n},x]$ is a subinterval of $I$ of length $\delta_{n}$ , so Assumption (A.2) guarantees that this interval contains at least one observation $x_{j}$ . Moreover,

[TABLE]

Consequently, with $M_{n}$ as in Corollary 4.7, for any $y\in J$ we obtain the inequalities

[TABLE]

In the first step we used antitonicity of $\tilde{x}\mapsto\widehat{F}_{n\tilde{x}}(y)$ , in the second last step we used antitonicity of $\tilde{x}\mapsto F_{\tilde{x}}(y)$ , and the last step utilizes Assumption (A.1). But $\mathop{\rm I\!P}\nolimits(M_{n}\leq(D\log n)^{1/2})\to 1$ for any fixed $D>1$ , and on the event $\{M_{n}\leq(D\log n)^{1/2}\}$ , the previous considerations imply that

[TABLE]

with $C:=(C_{2}D/C_{3})^{1/2}+C_{1}C_{3}^{\alpha}$ .

Analogously one can show that on $\{M_{n}\leq(D\log n)^{1/2}\}$ ,

[TABLE]

with the same constant $C$ . ∎

The proof of Theorem 3.4 is based on Theorem 3.3 and two elementary inequalities for distribution functions:

Lemma 4.8.

Suppose that $F,G$ are distribution functions such that

[TABLE]

Then

[TABLE]

Lemma 4.9.

Suppose that $F$ is a distribution function so that, for given $0\leq\beta_{1}<\beta_{2}\leq 1$ and $\kappa>0$ ,

[TABLE]

for arbitrary $y_{1}<y_{2}$ such that $F(y_{1}),F(y_{2}-)\in(\beta_{1},\beta_{2})$ . Then $F^{-1}(\beta)=F^{-1}(\beta+)$ and

[TABLE]

for arbitrary $\beta,\beta^{\prime}\in(\beta_{1},\beta_{2})$ .

Proof of Lemma 4.8.

Let $\Delta<\beta<1$ and $y<F^{-1}(\beta-\Delta)$ . Then $F(y)<\beta-\Delta$ and thus

[TABLE]

Therefore, we have $y<G^{-1}(\beta)$ and letting $y\to F^{-1}(\beta-\Delta)$ yields the first inequality.

Next, let $0<\beta<1-\Delta$ and $y>F^{-1}((\beta+\Delta)+)$ . Then $F(y-)>\beta+\Delta$ and thus

[TABLE]

This gives $y>G^{-1}(\beta+)$ , and letting $y\to F^{-1}((\beta-\Delta)+)$ proves the second claim. ∎

Proof of Lemma 4.9.

Let $\beta,\beta^{\prime}\in(\beta_{1},\beta_{2})$ be such that $\beta<\beta^{\prime}$ . Define $y_{1}:=F^{-1}(\beta)$ and $y_{2}:=F^{-1}(\beta^{\prime})$ , so that $y_{1}\leq y_{2}$ . If $y_{1}=y_{2}$ , then (8) is trivial. In case $y_{1}<y_{2}$ , we have, for all $h\in(0,y_{2}-y_{1}]$ , that

[TABLE]

so that $F(y_{1}),F(y_{2}-h)\in(\beta_{1},\beta_{2})$ . Therefore, we get

[TABLE]

∎

Proof of Theorem 3.4.

With $\Delta_{n}:=C\rho_{n}^{\alpha/(2\alpha+1)}$ , we may write $B_{n}=(\beta_{1}+\Delta_{n},\beta_{2}-\Delta_{n})$ . Let $n$ be large enough so that $I_{n}$ and $B_{n}$ are nondegenerate intervals; in particular, $\Delta_{n}<1/2$ . The proof of Theorem 3.3 reveals that $\mathop{\rm I\!P}\nolimits(A_{n}^{*})\to 1$ , where $A_{n}^{*}$ is the event that

[TABLE]

Here $\widehat{F}_{nx,1}$ and $\widehat{F}_{nx,2}$ denote two extremal ways to extrapolate $\widehat{F}_{nx}$ from $x\in\{x_{1},\ldots,x_{m}\}$ to arbitrary $x\in\mathcal{X}$ : With $x_{0}:=-\infty$ and $x_{m+1}:=\infty$ , we define

[TABLE]

Then $\widehat{F}_{nx,1}\geq\widehat{F}_{nx}\geq\widehat{F}_{nx,2}$ for any choice of $(\widehat{F}_{x})_{x\in\mathcal{X}}$ . The event $A_{n}^{*}$ implies that $\widehat{F}_{nx,k}$ is a proper distribution function for $k=1,2$ and all $x\in I_{n}$ . Moreover, for $x\in I_{n}$ and $\beta\in B_{n}$ , it follows from Lemmas 4.8 and 4.9 that

[TABLE]

Consequently,

[TABLE]

as $n\to\infty$ . ∎

We now proceed to the proof of Theorem 3.5. Theorem 4.6 and Lemma 4.11 in the next subsection imply the following exponential inequality:

Corollary 4.10.

With the same notation as in Theorem 4.6, for any $D^{\prime}\in(0,2)$ there exists a universal constant $D^{\prime\prime}=D^{\prime\prime}(D^{\prime})$ such that

[TABLE]

for all $k_{o}\in\mathbb{N}$ and $\eta\geq 0$ .

Proof of Theorem 3.5.

Let us define the indices

[TABLE]

Since we assume the event $A_{n}$ in (A’.2 ${}_{x_{o}}$ ) to occur, we know that

[TABLE]

One can easily deduce from Corollary 4.10 that

[TABLE]

Consequently, for $y\in J$ ,

[TABLE]

But the right hand side does not depend on $y$ and is of order $O_{p}\bigl{(}(n\delta_{n})^{-1/2}+\delta_{n}^{\alpha}\bigr{)}=O_{p}(n^{-\alpha/(2\alpha+1)})$ . Consequently,

[TABLE]

Analogous arguments show that $\sup_{y\in J}\bigl{(}F_{x_{o}}(y)-\widehat{F}_{x_{o}}(y)\bigr{)}$ is of order $O_{p}(n^{-\alpha/(2\alpha+1)})$ , too. ∎

Proof of Theorem 3.6.

The proof uses essentially the same arguments as the proof of Theorem 3.4. The main differences are that we replace $I_{n}$ with $\{x_{o}\}$ and $\rho_{n}$ with $n^{-1}$ . ∎

4.4 An exponential inequality for the LLN

We consider stochastically independent random elements $Z_{1},Z_{2},Z_{3},\ldots$ with values in a normed vector space $(\mathcal{Z},\|\cdot\|)$ . Defining the partial sums $S_{0}:=0$ and $S_{n}:=\sum_{i=1}^{n}Z_{i}$ for $n\in\mathbb{N}$ , we assume that $\|S_{b}-S_{a}\|$ is measurable for arbitrary integers $0\leq a<b$ .

Lemma 4.11.

Suppose that there are constants $c>0$ and $C\geq 1$ such that for arbitrary integers $0\leq a<b$ and real numbers $\eta>0$ ,

[TABLE]

Then for arbitrary $c^{\prime}\in(0,c)$ there exists a constant $C^{\prime}$ such that

[TABLE]

for arbitrary numbers $n_{o},\eta\geq 0$ .

Corollary 4.10 is a consequence of this result, where $Z_{i}:=1_{[Y_{i}\leq\makebox[3.01389pt]{{$ \cdot $}}]}-F_{i}$ is a random bounded function on the real line, and $c=2$ .

Proof of Lemma 4.11.

Note that the right hand side of (10) is continuous in $\eta\geq 0$ and $n_{o}\geq 0$ , and it is not smaller than $1$ in case of $\eta=0$ or $n_{o}=0$ . Hence it suffices to verify that

[TABLE]

for arbitrary numbers $n_{o},\eta>0$ .

The essential ingredient will be the following inequality: For arbitrary real numbers $0\leq a<b$ and $\eta>0$ ,

[TABLE]

(with the maximum over the empty set interpreted as [math]). To verify this, it suffices to consider the case of $a$ and $b$ being integers; otherwise one could replace $a$ with $\lceil a\rceil$ and $b$ with $\lfloor b\rfloor$ , and this would even decrease the term $\sqrt{b}+\sqrt{b-a}$ in (12). Define the stopping time

[TABLE]

Then, for $0<\lambda<1$ ,

[TABLE]

Here the fourth last step follows from the triangle inequality for $\|\cdot\|$ : $\|S_{n}-S_{b}\|\geq\|S_{n}\|-\|S_{b}\|>\eta-\lambda\eta$ in case of $\tau=n$ and $\|S_{b}\|\leq\lambda\eta$ . The third last step follows from independence of the $Z_{i}$ and the fact that the event $\{\tau=n\}$ depends on $Z_{a},\ldots,Z_{n}$ , whereas $\|S_{n}-S_{b}\|$ is a function of $Z_{n+1},\ldots,Z_{b}$ . If we take

[TABLE]

then the two exponents in our inequality are identical, and we obtain (12).

Since $c^{\prime}<c$ , the constant

[TABLE]

satisfies $\beta>1$ and

[TABLE]

With (12) at hand, we may argue that for arbitrary numbers $n_{o}>0$ ,

[TABLE]

where $p(\eta):=c^{\prime}n_{o}\eta^{2}>0$ . Since $\beta^{x}$ is increasing in $x\geq 0$ , we find the upper bound

[TABLE]

which yields

[TABLE]

For a number $p_{o}>0$ to be specified later, the bound above is not greater than

[TABLE]

whenever $p(\eta)\geq p_{o}$ . But in case of $p(\eta)\leq p_{o}$ , the latter bound is at least

[TABLE]

if we set $p_{o}:=\min\{(\log\beta)^{-1},\log(4C)\}$ . Consequently, with this choice of $p_{o}$ , (11) is true with $C^{\prime}:=2C\bigl{(}1+(p_{o}\log\beta)^{-1}\bigr{)}$ . ∎

Acknowledgements.

This work was supported by Swiss National Science Foundation. The authors are grateful to Geurt Jongbloed for drawing their attention to El Barmi and Mukerjee (2005) and to Johanna Ziegel for stimulating discussions.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bretagnolle (1980) Bretagnolle, J. (1980). Statistique de Kolmogorov-Smirnov pour un échantillon nonéquiréparti. Colloques Internationaux du CNRS 307 39–44.
2Casady and Cryer (1976) Casady, R. J. and Cryer, J. D. (1976). Monotone percentile regression. Ann. Statist. 4 532–541.
3Dümbgen and Kovac (2009) Dümbgen, L. and Kovac, A. (2009). Extensions of smoothing via taut strings. Electron. J. Statist. 3 41–75.
4El Barmi and Mukerjee (2005) El Barmi, H. and Mukerjee, H. (2005). Inferences under a stochastic ordering constraint. J. Amer. Statist. Assoc. 100 252–261.
5Henzi (2018) Henzi, A. (2018). Isotonic Distributional Regression (IDR): A powerful nonparametric calibration technique . Master’s thesis, University of Bern.
6Hu (1985) Hu, I. (1985). A uniform bound for the tail probability of Kolmogorov-Smirnov statistics. Ann. Statist. 13 821–826.
7Koenker and Bassett (1978) Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica 46 33–50.
8Mühlemann et al. (2019) Mühlemann, A. , Jordan, A. I. and Ziegel, J. F. (2019). Optimal solutions to the isotonic regression problem. Preprint, ar Xiv:1904.04761 [math.ST].

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Monotone Least Squares and Isotonic Quantiles

Abstract

Keywords:

AMS 2000 subject classifications:

1 Introduction

2 Estimation of the conditional distributions

2.1 Estimation of FxF_{x}Fx​ via monotone least squares

2.2 Plug-in estimation of QxQ_{x}Qx​

2.3 Estimation of QxQ_{x}Qx​ via monotone regression quantiles

Lemma 2.1.

Remark 2.2.

2.4 Connection between the two estimation paradigms

Lemma 2.3.

Example 2.4.

3 Asymptotic considerations

3.1 Uniform consistency in both arguments

(A.1)

(A.2)

Remark 3.1** **(Fixed design points).

Remark 3.2** **(Random design points).

Theorem 3.3.

(A.3)

Theorem 3.4.

3.2 Uniform consistency at a single point xox_{o}xo​

(A’.1xo\boldsymbol{{}_{x_{o}}}xo​​)

(A’.2xo\boldsymbol{{}_{x_{o}}}xo​​)

Theorem 3.5.

(A’.3xo\boldsymbol{{}_{x_{o}}}xo​​)

Theorem 3.6.

4 Proofs and technical details

4.1 Monotone regression

Proposition 4.1.

Proof.

Proposition 4.2.

Proof of Proposition 4.2.

Corollary 4.3.

Proof.

Theorem 4.4.

Proof of Theorem 4.4.

Theorem 4.5.

Proof.

4.2 Proofs of Lemma 2.1 and 2.3

Proof of Lemma 2.1.

Proof of Lemma 2.3.

4.3 Asymptotics

Theorem 4.6.

Corollary 4.7.

Proof of Corollary 4.7.

Proof of Theorem 3.3.

Lemma 4.8.

Lemma 4.9.

Proof of Lemma 4.8.

Proof of Lemma 4.9.

Proof of Theorem 3.4.

Corollary 4.10.

Proof of Theorem 3.5.

Proof of Theorem 3.6.

4.4 An exponential inequality for the LLN

Lemma 4.11.

Proof of Lemma 4.11.

Acknowledgements.

2.1 Estimation of $F_{x}$ via monotone least squares

2.2 Plug-in estimation of $Q_{x}$

2.3 Estimation of $Q_{x}$ via monotone regression quantiles

Remark 3.1 (Fixed design points).

Remark 3.2 (Random design points).

3.2 Uniform consistency at a single point $x_{o}$

(A’.1 $\boldsymbol{{}_{x_{o}}}$ )

(A’.2 $\boldsymbol{{}_{x_{o}}}$ )

(A’.3 $\boldsymbol{{}_{x_{o}}}$ )