Efficient Second-Order Shape-Constrained Function Fitting

David Durfee; Yu Gao; Anup B. Rao; Sebastian Wild

arXiv:1905.02149·cs.DS·May 30, 2019

Efficient Second-Order Shape-Constrained Function Fitting

David Durfee, Yu Gao, Anup B. Rao, Sebastian Wild

PDF

TL;DR

This paper introduces efficient algorithms for fitting one-dimensional shape-constrained functions, including monotonicity and convexity, with near-linear time complexity, advancing the computational methods for such problems.

Contribution

The paper presents the first near-linear-time algorithms for second-order shape-constrained function fitting, applicable to a broad class of shape constraints and utilizing a novel geometric interpretation.

Findings

01

Algorithms achieve $O(n ext{ log } rac{U}{ ext{error}})$ time for general shape constraints.

02

A simple $O(n)$ greedy algorithm for unweighted convex regression.

03

Generalization to DAGs is as hard as linear programming.

Abstract

We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted- $L_{\infty}$ norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algorithm computes an approximation with additive error $ε$ in $O (n lo g \frac{U}{ε})$ time, where $U$ captures the range of input values. We also give a simple greedy algorithm that runs in $O (n)$ time for the special case of unweighted $L_{\infty}$ convex regression. These are the first (near-)linear-time algorithms for second-order-constrained function fitting. To achieve these results, we use a novel geometric interpretation of the underlying dynamic programming…

Equations63

f_{i}^{'} = \frac{f _{i} - f _{i - 1}}{x _{i} - x _{i - 1}}, (i \in [2.. n]), and f_{i}^{''} = \frac{f _{i}^{'} - f _{i - 1}^{'}}{x _{i} - x _{i - 1}}, (i \in [3.. n]);

f_{i}^{'} = \frac{f _{i} - f _{i - 1}}{x _{i} - x _{i - 1}}, (i \in [2.. n]), and f_{i}^{''} = \frac{f _{i}^{'} - f _{i - 1}^{'}}{x _{i} - x _{i - 1}}, (i \in [3.. n]);

\displaystyle\mathchoice{\mbox{\boldmath$\displaystyle f^{*}$}}{\mbox{\boldmath$\textstyle f^{*}$}}{\mbox{\boldmath$\scriptstyle f^{*}$}}{\mbox{\boldmath$\scriptscriptstyle f^{*}$}}\mathchoice{\mathrel{\;\;{=}\;\;}}{\mathrel{=}}{\mathrel{=}}{\mathrel{=}}\operatornamewithlimits{\arg\min}_{\mathchoice{\mbox{\boldmath$\displaystyle f$}}{\mbox{\boldmath$\textstyle f$}}{\mbox{\boldmath$\scriptstyle f$}}{\mbox{\boldmath$\scriptscriptstyle f$}}\in F}\left(\max_{i}\,w_{i}\cdot|f_{i}-y_{i}|\right).

\displaystyle\mathchoice{\mbox{\boldmath$\displaystyle f^{*}$}}{\mbox{\boldmath$\textstyle f^{*}$}}{\mbox{\boldmath$\scriptstyle f^{*}$}}{\mbox{\boldmath$\scriptscriptstyle f^{*}$}}\mathchoice{\mathrel{\;\;{=}\;\;}}{\mathrel{=}}{\mathrel{=}}{\mathrel{=}}\operatornamewithlimits{\arg\min}_{\mathchoice{\mbox{\boldmath$\displaystyle f$}}{\mbox{\boldmath$\textstyle f$}}{\mbox{\boldmath$\scriptstyle f$}}{\mbox{\boldmath$\scriptscriptstyle f$}}\in F}\left(\max_{i}\,w_{i}\cdot|f_{i}-y_{i}|\right).

\forall i \in [1.. n] x_{i}^{-}

\forall i \in [1.. n] x_{i}^{-}

\forall i \in [2.. n] y_{i}^{-}

\forall i \in [3.. n] z_{i}^{-}

x_{i}^{\pm}

x_{i}^{\pm}

z_{i}^{\pm}

\max_{i}w_{i}\left|f_{i}-y_{i}\right|\mathchoice{\mathrel{\;\;{\leq}\;\;}}{\mathrel{\leq}}{\mathrel{\leq}}{\mathrel{\leq}}\min_{\mathchoice{\mbox{\boldmath$\displaystyle g$}}{\mbox{\boldmath$\textstyle g$}}{\mbox{\boldmath$\scriptstyle g$}}{\mbox{\boldmath$\scriptscriptstyle g$}}\in F}\left(\max_{i}w_{i}\left|g_{i}-y_{i}\right|\right)+\varepsilon.

\max_{i}w_{i}\left|f_{i}-y_{i}\right|\mathchoice{\mathrel{\;\;{\leq}\;\;}}{\mathrel{\leq}}{\mathrel{\leq}}{\mathrel{\leq}}\min_{\mathchoice{\mbox{\boldmath$\displaystyle g$}}{\mbox{\boldmath$\textstyle g$}}{\mbox{\boldmath$\scriptstyle g$}}{\mbox{\boldmath$\scriptscriptstyle g$}}\in F}\left(\max_{i}w_{i}\left|g_{i}-y_{i}\right|\right)+\varepsilon.

\displaystyle P_{i}\mathchoice{\mathrel{\;\;{=}\;\;}}{\mathrel{=}}{\mathrel{=}}{\mathrel{=}}\bigl{\{}(x,y)\bigm{|}\exists\mathchoice{\mbox{\boldmath$\displaystyle b$}}{\mbox{\boldmath$\textstyle b$}}{\mbox{\boldmath$\scriptstyle b$}}{\mbox{\boldmath$\scriptscriptstyle b$}}\in\mathcal{S}_{i}:x=b_{i}\wedge y=b_{i}-b_{i-1}\bigr{\}}

\displaystyle P_{i}\mathchoice{\mathrel{\;\;{=}\;\;}}{\mathrel{=}}{\mathrel{=}}{\mathrel{=}}\bigl{\{}(x,y)\bigm{|}\exists\mathchoice{\mbox{\boldmath$\displaystyle b$}}{\mbox{\boldmath$\textstyle b$}}{\mbox{\boldmath$\scriptstyle b$}}{\mbox{\boldmath$\scriptscriptstyle b$}}\in\mathcal{S}_{i}:x=b_{i}\wedge y=b_{i}-b_{i-1}\bigr{\}}

\textsc u - h u l l (P)

\textsc u - h u l l (P)

\textsc l - h u l l (P)

\textsc u - h u l l (P_{i}^{g (z)})

\textsc u - h u l l (P_{i}^{g (z)})

\textsc l - h u l l (P_{i}^{g (z)})

b_{i} \geq a_{i} - Δ

b_{i} \geq a_{i} - Δ

⟺ a_{j} + Δ + \frac{a _{k} - a _{j}}{k - j} (i - j) \geq a_{i} - Δ

⟺ a_{j} + Δ + \frac{a _{k} - a _{j}}{k - j} (i - j) \geq a_{i} - Δ

⟺ Δ \geq \frac{1}{2} (a_{i} - a_{j} + \frac{i - j}{k - j} (a_{k} - a_{j}))

⟺ Δ \geq \frac{1}{2} (a_{i} - a_{j} + \frac{i - j}{k - j} (a_{k} - a_{j}))

0 \leq slope (f_{i}^{*} (v_{j}), f_{i}^{*} (v_{k})) \leq 1

0 \leq slope (f_{i}^{*} (v_{j}), f_{i}^{*} (v_{k})) \leq 1

slope (f_{i}^{*} (v_{j}), f_{i}^{*} (v_{k})) < slope (f_{i}^{*} (v_{k}), f_{i}^{*} (v_{l}))

slope (f_{i}^{*} (v_{j}), f_{i}^{*} (v_{k})) < slope (f_{i}^{*} (v_{k}), f_{i}^{*} (v_{l}))

slope (f_{i}^{*} (v_{j}), f_{i}^{*} (v_{k})) = \frac{( y _{j} + z _{i}^{*} ) - ( y _{k} + z _{i}^{*} )}{( x _{j} + y _{j} + z _{i}^{*} ) - ( x _{k} + y _{k} + z _{i}^{*} )} = \frac{y _{j} - y _{k}}{( x _{j} - x _{k} ) + ( y _{j} - y _{k} )} .

slope (f_{i}^{*} (v_{j}), f_{i}^{*} (v_{k})) = \frac{( y _{j} + z _{i}^{*} ) - ( y _{k} + z _{i}^{*} )}{( x _{j} + y _{j} + z _{i}^{*} ) - ( x _{k} + y _{k} + z _{i}^{*} )} = \frac{y _{j} - y _{k}}{( x _{j} - x _{k} ) + ( y _{j} - y _{k} )} .

slope (f_{i}^{*} (v_{j}), f_{i}^{*} (v_{k}))^{- 1} = slope (v_{j}, v_{k})^{- 1} + 1

slope (f_{i}^{*} (v_{j}), f_{i}^{*} (v_{k}))^{- 1} = slope (v_{j}, v_{k})^{- 1} + 1

V\mathchoice{\mathrel{\;{=}\;}}{\mathrel{=}}{\mathrel{=}}{\mathrel{=}}\bigl{\{}f_{i}^{+}(\textsc{u-hull}(P_{i-1}^{\alpha_{i}}))\bigr{\}}\cup\bigl{\{}f_{i}^{-}(\textsc{l-hull}(P_{i-1}^{\alpha_{i}}))\bigr{\}}.

V\mathchoice{\mathrel{\;{=}\;}}{\mathrel{=}}{\mathrel{=}}{\mathrel{=}}\bigl{\{}f_{i}^{+}(\textsc{u-hull}(P_{i-1}^{\alpha_{i}}))\bigr{\}}\cup\bigl{\{}f_{i}^{-}(\textsc{l-hull}(P_{i-1}^{\alpha_{i}}))\bigr{\}}.

(x_{j}, y_{j}) \in V (P_{i - 1}^{α_{i}}) \sum p_{j} ((x_{j} + y_{j}, y_{j}) + (z_{i}^{*}, z_{i}^{*})),

(x_{j}, y_{j}) \in V (P_{i - 1}^{α_{i}}) \sum p_{j} ((x_{j} + y_{j}, y_{j}) + (z_{i}^{*}, z_{i}^{*})),

z_{i}^{-} \leq 2 b_{q_{i}} - b_{r_{i}} - b_{p_{i}} \leq z_{i}^{+} .

z_{i}^{-} \leq 2 b_{q_{i}} - b_{r_{i}} - b_{p_{i}} \leq z_{i}^{+} .

2 b_{q_{i}} = b_{p_{i}} + b_{r_{i}} .

2 b_{q_{i}} = b_{p_{i}} + b_{r_{i}} .

b_{i_{1}} + b_{i_{2}} + \dots b_{i_{k}} \leq c_{i},

b_{i_{1}} + b_{i_{2}} + \dots b_{i_{k}} \leq c_{i},

i_{1}

i_{1}

i_{3}

\dots

b_{i_{12 \dots k}} \leq x_{12 \dots k}^{+} : = c_{i}

b_{i_{12 \dots k}} \leq x_{12 \dots k}^{+} : = c_{i}

b_{i_{12}} = b_{i_{1}} + b_{i_{2}}, \dots, b_{i_{(k - 1) k}} = b_{i_{k - 1}} + b_{i_{k}}, \dots, b_{i_{1 \dots k}} = b_{i_{1 \dots k /2}} + b_{i_{k /2 + 1 \dots k}} .

b_{i_{12}} = b_{i_{1}} + b_{i_{2}}, \dots, b_{i_{(k - 1) k}} = b_{i_{k - 1}} + b_{i_{k}}, \dots, b_{i_{1 \dots k}} = b_{i_{1 \dots k /2}} + b_{i_{k /2 + 1 \dots k}} .

b_{i_{1}} \pm \dots \pm b_{i_{k}} \leq c_{i} .

b_{i_{1}} \pm \dots \pm b_{i_{k}} \leq c_{i} .

b_{pos} - b_{neg} \leq c_{i} .

b_{pos} - b_{neg} \leq c_{i} .

0 \leq 2 b_{i} - b_{0} - b_{j} \leq 0

0 \leq 2 b_{i} - b_{0} - b_{j} \leq 0

0 \leq b_{0} \leq 0

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\clearscrheadfoot\ohead\pagemark\ihead\headmark\addtokomafont

caption \addtokomafontcaptionlabel \setcapmargin2em

Theorem[section]

\manualmark\markleftEfficient Second-Order Shape-Constrained Function Fitting\automark*[section]

Efficient Second-Order Shape-Constrained Function Fitting††thanks: The first author is supported in part by

National Science Foundation Grant 1718533. The last author is supported by the Natural Sciences and Engineering Research Council of Canada and the Canada Research Chairs Programme.

David Durfee Georgia Institute of Technology $\cdot$ {ddurfee,ygao380} @ gatech.edu

Yu Gao

Anup B. Rao Adobe Research $\cdot$ anuprao @ adobe.com

Sebastian Wild University of Waterloo $\cdot$ wild @ uwaterloo.ca

Abstract

We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted- $L_{\infty}$ norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algorithm computes an approximation with additive error $\varepsilon$ in $O\left(n\log\frac{U}{\varepsilon}\right)$ time, where $U$ captures the range of input values. We also give a simple greedy algorithm that runs in $O(n)$ time for the special case of unweighted $L_{\infty}$ convex regression. These are the first (near-)linear-time algorithms for second-order-constrained function fitting. To achieve these results, we use a novel geometric interpretation of the underlying dynamic programming problem. We further show that a generalization of the corresponding problems to directed acyclic graphs (DAGs) is as difficult as linear programming.

1 Introduction

We consider the fundamental problem of finding a function $f$ that approximates a given set of data points $(x_{1},y_{1}),\ldots,(x_{n},y_{n})$ in the plane with smallest possible error, i. e., $f(x_{i})$ shall be close to $y_{i}$ (formalized below), subject to shape constraints on the allowable functions $f$ , such as being increasing and/or concave. More specifically, we present a new algorithm that can handle arbitrary constraints on the (discrete) first- and second-order derivatives of $f$ .

When we only require $f$ to be weakly increasing, the problem is known as isotonic regression, a classic problem in statistics; (see, e. g., [13] for history and applications). It has more recently also found uses in machine learning [17, 16, 12].

In certain applications, further shape restrictions are integral part of the model: For example, microeconomic theory suggests that production functions are weakly increasing and concave (modeling diminishing marginal returns); similar reasoning applies to utility functions. Restricting $f$ to functions with bounded derivative (Lipschitz-continuous functions) is desirable to avoid overfitting [16]. All these shape restrictions can be expressed by inequalities for first and second derivatives of $f$ ; their discretized equivalents are hence amenable to our new method. Shape restrictions that we cannot directly handle are studied in [28] ( $f$ is piecewise constant and the number of breakpoints is to be minimized) and [26] (unimodal $f$ ). For a more comprehensive survey of shape-constrained function-fitting problems and their applications, see [14, §1]. Motivated by these applications, the problems have been studied in statistics (as a form of nonparametric regression), investigating, e. g., their consistency as estimators and their rate of convergence [13, 14, 4].

While fast algorithms for isotonic-regression variants have been designed [27], both [22] and [3] list shape constraints beyond monotonicity as important challenges. For example, fitting (multidimensional) convex functions is mostly done via quadratic or linear programming solvers [24]. In his PhD thesis, Balázs writes that current “methods are computationally too expensive for practical use, [so] their analysis is used for the design of a heuristic training algorithm which is empirically evaluated” [4, p. 1].

This lack of efficient algorithms motivated the present work. Despite a few limitations discussed below (implying that we do not yet solve Balázs’ problem), we give the first near-linear-time algorithms for any function-fitting problem with second-order shape constraints (such as convexity). We use dynamic programming (DP) with a novel geometric encoding for the “states”. Simpler versions of such geometric DP variants were used for isotonic regression [25] and are well-known in the competitive programming community; incorporating second-order constraints efficiently is our main innovation.

Problem definition.

Given the vectors $\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}=(x_{1},\ldots,x_{n})\in\mathbb{R}^{n}$ and $\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}}\in\mathbb{R}^{n}$ , an error norm $d$ and shape constraints (formalized below), compute $\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}}=(f_{1},\ldots,f_{n})$ satisfying the shape constraints with minimal $d(\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}},\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}})$ , i. e., we represent $f$ via its values $f_{i}=f(x_{i})$ at the given points. $d$ is usually an $L_{p}$ norm, $d(\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}},\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}})=\bigl{(}\sum_{i}|x_{i}-y_{i}|^{p}\bigr{)}{}^{1/p}$ ; least squares ( $p=2$ ) dominate in statistics, but more general error functions have been studied for isotonic regression [23, 19, 22, 3]. We will consider the weighted $L_{\infty}$ norm, i. e., $d(\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}},\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}})=\max_{i\in[n]}w_{i}\mathopen{|}f_{i}-y_{i}\mathclose{|}$ , where $[n]=\{1,\ldots,n\}$ and $\mathchoice{\mbox{\boldmath$ \displaystyle w $}}{\mbox{\boldmath$ \textstyle w $}}{\mbox{\boldmath$ \scriptstyle w $}}{\mbox{\boldmath$ \scriptscriptstyle w $}}\in\mathbb{R}_{\geq 0}^{n}$ is a given vector of weights.

Since we are dealing with discretized functions (a vector $\textstyle f$ ), restrictions for derivatives $f^{\prime}$ and $f^{\prime\prime}$ have to be discretized, as well. We define local slope and curvature as

[TABLE]

the shape constraints are then given in the form of vectors $\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime-} $}}{\mbox{\boldmath$ \textstyle f^{\prime-} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime-} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime-} $}},\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime+} $}}{\mbox{\boldmath$ \textstyle f^{\prime+} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime+} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime+} $}},\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \textstyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime\prime-} $}},\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime\prime+} $}}{\mbox{\boldmath$ \textstyle f^{\prime\prime+} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime\prime+} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime\prime+} $}}$ of bounds for the first- and second-order differences, i. e., we define the set of feasible answers as $F=\bigl{\{}\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}}\in\mathbb{R}^{n}\bigm{|}\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime-} $}}{\mbox{\boldmath$ \textstyle f^{\prime-} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime-} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime-} $}}\leq\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime} $}}{\mbox{\boldmath$ \textstyle f^{\prime} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime} $}}\leq\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime+} $}}{\mbox{\boldmath$ \textstyle f^{\prime+} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime+} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime+} $}}\mathchoice{\mathbin{\>{\wedge}\>}}{\mathbin{\wedge}}{\mathbin{\wedge}}{\mathbin{\wedge}}\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \textstyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime\prime-} $}}\leq\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime\prime} $}}{\mbox{\boldmath$ \textstyle f^{\prime\prime} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime\prime} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime\prime} $}}\leq\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime\prime+} $}}{\mbox{\boldmath$ \textstyle f^{\prime\prime+} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime\prime+} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime\prime+} $}}\bigr{\}}$ where inequalities on vectors mean the inequality on all components. The weighted- $L_{\infty}$ function-fitting problem with second-order shape constraints is then to find

[TABLE]

Often, we only need a lower resp. upper bound; we can achieve that by allowing $-\infty$ and $+\infty$ entries in $f^{\prime\pm}_{i}$ and $f^{\prime\prime\pm}_{i}$ . For example, setting $\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \textstyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime\prime-} $}}=0$ , $\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime-} $}}{\mbox{\boldmath$ \textstyle f^{\prime-} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime-} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime-} $}}=\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \textstyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime\prime-} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime\prime-} $}}=+\infty$ and $\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime}- $}}{\mbox{\boldmath$ \textstyle f^{\prime}- $}}{\mbox{\boldmath$ \scriptstyle f^{\prime}- $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime}- $}}=-\infty$ , we can enforce a convex function/vector. We also consider the decision-version of the problem: given a bound $L$ , decide if there is an $\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}}\in F$ with $\max_{i}w_{i}\left|f_{i}-y_{i}\right|\leq L$ , and if so, report one.

Contributions.

Our main result is a single $O(n)$ -time algorithm for the decision problem of function fitting with second-order constraints; see Theorem 1.1 for the precise statement. With binary search, this readily yields an additive $\varepsilon$ -approximation for (1), and thus weighted $L_{\infty}$ isotonic regression, convex regression and Lipschitz convex regression, in $O\bigl{(}n\log\frac{U}{\varepsilon}\bigr{)}$ time (Theorem 1.3), where $U=(\max_{i}w_{i})\cdot(\max_{i}y_{i}-\min_{i}y_{i})$ . In the appendix, we give a simple greedy algorithm (see Theorem A.1) for unweighted ( $\mathchoice{\mbox{\boldmath$ \displaystyle w $}}{\mbox{\boldmath$ \textstyle w $}}{\mbox{\boldmath$ \scriptstyle w $}}{\mbox{\boldmath$ \scriptscriptstyle w $}}=1$ ) $L_{\infty}$ convex regression that runs in $O(n)$ time. Finally, we show that a generalization of the problem to DAGs (where the applied first- and second-order difference constraints are restricted by the graph), is as hard as linear programming, see Appendix D.

Related work.

Stout [27] surveys algorithms for various versions of isotonic regression; they achieve near-linear or even linear time for many error metrics. He also considers the generalization to any partial order (instead of the total order corresponding to weakly increasing functions). A related task is to fit a piecewise-constant function (with a prescribed number of jumps) to given data. [9, 10] solve this problem for $L_{\infty}$ in optimal $O(n\log n)$ time. Since the geometric constraints are much easier than in our case, a simple greedy algorithm suffices to solve the decision version.

For more restricted shapes, less is known. [26] gives a $O(n\log n)$ solution for unimodal regression. [1] gives an $O(n\log n)$ algorithm for unweighted $L_{2}$ Lipschitz isotonic regression and a $O(n\operatorname{poly}(\log n))$ time algorithm for Lipschitz unimodal regression. [24] describes (multidimensional) $L_{2}$ convex regression algorithms based quadratic programming. Fefferman [8] studied a closely related problem of smooth interpolation of data in Euclidean space minimizing a certain norm defined on the derivatives of the function. His setup is much more general, but his algorithm cannot find arbitrarily good interpolations ( $\varepsilon$ is fixed for the algorithm). All fast algorithms above consider classes defined by constraints on the first derivative only, not the second derivative as needed for convexity. To our knowledge, the fastest prior solution for any convex regression problem is solving a linear program, which will imply super-linear time.

We use a geometric interpretation of dynamic-programming states and represent them implicitly. The work closest in spirit to ours is a recent article by Rote [25]; establishing the transformation of states is much more complicated in the presently studied problem, though. Implicitly representing a series of more complicated objects using data structures has been used in geometric and graph algorithms, such as multiple-source shortest paths [18] and shortest paths in polygons [5, 21, 7]. The only other work (we know of) that interprets dynamic programming geometrically is [28].

There is a rich literature on methods for speeding up dynamic programming [29, 30, 6, 11]. They involve a variety of powerful techniques such as monotonicity of transition points, quadrangle inequalities, and Monge matrix searching [2], many of which have found applications in other settings. The focus of these methods is to reduce the (average) number of transitions that a state is involved in, often from $O(n)$ to $O(1)$ . Therefore, their running times are lower bounded by the number of states in the dynamic programs.

1.1 Results

We formally state our theorem for the decision problem here; results for shape-constrained function fitting are obtained as corollaries. For our algorithm, the discrete derivatives (as defined above) are inconvenient because they involve the $x$ -distance between points. We therefore normalize all $x$ -distances to $1$ (s. t. $x_{i}=i$ ); for the second-order constraints, this normalization makes the introduction of an additional parameter necessary, the scaling factors $\alpha_{i}$ (see below).

Definition \thetheorem (1st/2nd-diff-constrained vectors):

Let $n$ -dimensional vectors $\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}^{-}\leq\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}^{+}$ (value bounds), $\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}}^{-}\leq\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}}^{+}$ (difference bounds), $\mathchoice{\mbox{\boldmath$ \displaystyle z $}}{\mbox{\boldmath$ \textstyle z $}}{\mbox{\boldmath$ \scriptstyle z $}}{\mbox{\boldmath$ \scriptscriptstyle z $}}^{-}\leq\mathchoice{\mbox{\boldmath$ \displaystyle z $}}{\mbox{\boldmath$ \textstyle z $}}{\mbox{\boldmath$ \scriptstyle z $}}{\mbox{\boldmath$ \scriptscriptstyle z $}}^{+}$ (second-order difference bounds), and $\mathchoice{\mbox{\boldmath$ \displaystyle\alpha $}}{\mbox{\boldmath$ \textstyle\alpha $}}{\mbox{\boldmath$ \scriptstyle\alpha $}}{\mbox{\boldmath$ \scriptscriptstyle\alpha $}}>0$ be given. We define $\mathcal{S}\subset\mathbb{R}^{n}$ to be the set of all $\mathchoice{\mbox{\boldmath$ \displaystyle b $}}{\mbox{\boldmath$ \textstyle b $}}{\mbox{\boldmath$ \scriptstyle b $}}{\mbox{\boldmath$ \scriptscriptstyle b $}}\in\mathbb{R}^{n}$ that satisfy the following constraints:

[TABLE]

*Moreover, we consider the “truncated problems” $\mathcal{S}_{k}$ , where $\mathcal{S}_{k}$ is the set of all $\mathchoice{\mbox{\boldmath$ \displaystyle b $}}{\mbox{\boldmath$ \textstyle b $}}{\mbox{\boldmath$ \scriptstyle b $}}{\mbox{\boldmath$ \scriptscriptstyle b $}}\in\mathbb{R}^{n}$ that satisfy the constraints up to $k$ (instead of $n$ ). *

A visualization of an example is shown in Figure 1. We can encode an instance $(\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}},\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}},\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime\pm} $}}{\mbox{\boldmath$ \textstyle f^{\prime\pm} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime\pm} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime\pm} $}},\mathchoice{\mbox{\boldmath$ \displaystyle f^{\prime\prime\pm} $}}{\mbox{\boldmath$ \textstyle f^{\prime\prime\pm} $}}{\mbox{\boldmath$ \scriptstyle f^{\prime\prime\pm} $}}{\mbox{\boldmath$ \scriptscriptstyle f^{\prime\prime\pm} $}})$ of the decision version of the weighted- $L_{\infty}$ function-fitting problem with second-order constraints as 1st/2nd-diff-constrained vectors by setting

[TABLE]

So, our goal is to efficiently compute some $\mathchoice{\mbox{\boldmath$ \displaystyle b $}}{\mbox{\boldmath$ \textstyle b $}}{\mbox{\boldmath$ \scriptstyle b $}}{\mbox{\boldmath$ \scriptscriptstyle b $}}\in\mathcal{S}$ or determine that $\mathcal{S}=\emptyset$ . Our core technical result is a linear-time algorithm for this task:

Theorem 1.1 (1st/2nd-diff-constrained decision)

With the notation of Definition 1.1, in $O(n)$ time, we can compute $\mathchoice{\mbox{\boldmath$ \displaystyle b $}}{\mbox{\boldmath$ \textstyle b $}}{\mbox{\boldmath$ \scriptstyle b $}}{\mbox{\boldmath$ \scriptscriptstyle b $}}\in\mathcal{S}$ or determine that $\mathcal{S}=\emptyset$ . $\Box$

Section 2 will be devoted to the proof. To simplify the presentation, we will assume throughout that $\textstyle x^{+}$ , $\textstyle x^{-}$ , $\textstyle y^{+}$ , $\textstyle y^{-}$ , $\textstyle z^{+}$ , $\textstyle z^{-}$ are bounded.111Some problems are stated with $\pm\infty$ values, but we can always replace unbounded values in the algorithms with an (input-specific) sufficiently large finite number.

For the optimization version of the problem, Equation (1), we consider approximate solutions in the following sense.

Definition 1.2 ( $\varepsilon$ -approximation)

We call $\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}}\in F$ an $\varepsilon$ -approximate solution to the weighted $L_{\infty}$ function-fitting problem if it satisfies

[TABLE]

$\Box$

By a simple binary search on $L$ , we can find approximate solutions.

Theorem 1.3 (Main result)

There exists an algorithm that computes an $\varepsilon$ -approximate solution to the weighted- $L_{\infty}$ convex regression problem that runs in $O(n\log\frac{U}{\varepsilon})$ time, where $U=(\max_{i}w_{i})(\max_{i}y_{i}-\min_{i}y_{i}).$ The same holds true for isotonic regression, Lipschitz isotonic regression, convex isotonic regression. $\Box$

Proof 1.4

We will argue for the case of convex regression here, other cases are similar. Abbreviate $L(\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}})=\max_{i}w_{i}|f_{i}-y_{i}|$ . For a given $L$ , the decision version of convex regression can be solved in $O(n)$ time using Theorem 1.1. That is, in $O(n)$ time, we can either find $\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}}\in F$ such that $L(\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}})\leq L$ or conclude that for all $\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}}\in F,$ $L(\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}})>L.$ If we know an $L_{0}$ for which there exists $\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}}\in F$ with $L(\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}})\leq L_{0}$ , then we can do a binary search for $L_{c}$ in $[0,L_{0}].$ We can easily find such an $L_{0}$ for the convex case: Let $\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}}=\min y_{j}$ be constant (hence convex). For this $\textstyle f$ , we have $L(\mathchoice{\mbox{\boldmath$ \displaystyle f $}}{\mbox{\boldmath$ \textstyle f $}}{\mbox{\boldmath$ \scriptstyle f $}}{\mbox{\boldmath$ \scriptscriptstyle f $}})\leq(\max_{j}w_{j})(\max_{j}y_{j}-\min_{j}y_{j}).$ Therefore, we can take $L_{0}=(\max_{j}w_{j})(\max_{j}y_{j}-\min_{j}y_{j})$ and the result immediately follows. $\Box$

We note that for the specific case of unweighted convex function fitting, there is a simpler linear-time greedy algorithm; we give more details on that in Appendix A. This algorithm was the initial motivation for studying this problem and for the geometric approach we use. For more general settings, in particular second-order differences that are allowed to be both positive and negative, the greedy approach does not work; our generic algorithm, however, is almost as simple and efficient.

2 First- and second-order difference-constrained vectors

In this section, we present our main algorithm and prove Theorem 1.1. In Section 2.1, we give an overview and introduce the feasibility polygons $P_{i}$ . Section 2.2 shows how $P_{i}$ can be inductively computed from $P_{i-1}$ via a geometric transformation. We finally show how this transformation can be computed efficiently, culminating in the proof of Theorem 1.1, in Section 2.3. Two proofs are deferred to Appendix B and C.

2.1 Overview of the algorithm

Recall that the problem we want to solve, in order to prove Theorem 1.1, is finding a feasible point $\textstyle b$ in $\mathcal{S}$ from Definition 1.1. Our algorithm will use dynamic programming (DP) where each state is associated with the feasible $b_{i}$ in the truncated problem. We will iteratively determine all $b_{i}$ such that $b_{i}$ is the $i$ th entry of some $\mathchoice{\mbox{\boldmath$ \displaystyle b $}}{\mbox{\boldmath$ \textstyle b $}}{\mbox{\boldmath$ \scriptstyle b $}}{\mbox{\boldmath$ \scriptscriptstyle b $}}\in\mathcal{S}_{i}$ .

Feasible $b_{i}$ have to respect the first- and second-order difference constraints. To check those, we also need to know the possible pairs $(b_{i-1},b_{i-2})$ of $(i-1)$ th and $(i-2)$ th entries for some $\mathchoice{\mbox{\boldmath$ \displaystyle b $}}{\mbox{\boldmath$ \textstyle b $}}{\mbox{\boldmath$ \scriptstyle b $}}{\mbox{\boldmath$ \scriptscriptstyle b $}}\in\mathcal{S}_{i-1}$ , so the states have to maintain more information than the $b_{i}$ alone. It will be instrumental to rewrite this pair as $(b_{i-1},b_{i-1}-b_{i-2})$ , the combination of valid values $b_{i-1}$ and valid slopes at which we entered $b_{i-1}$ for a solution in $\mathcal{S}_{i-1}$ . From that, we can determine the valid slopes at which we can leave $b_{i-1}$ using our shape constraints. We thus define the feasibility polygons

[TABLE]

for $i=2,\ldots,n$ . See Figure 1 for an example. We view each point in $P_{i}$ as a “state” in our DP algorithm, and our goal becomes to efficiently compute $P_{i}$ from $P_{i-1}$ . The key observation is that each $P_{i}$ is indeed an $O(n)$ -vertex convex polygon, and we only need an efficient way to compute the vertices of $P_{i}$ from those of $P_{i-1}$ . This needs a clever representation, though, since all vertices can change when going from $P_{i-1}$ to $P_{i}$ . A closer look reveals that we can represent the vertex transformations implicitly, without actually updating each vertex, and we can combine subsequent transformations into a single one. More specifically, if we consider the boundary of $P_{i-1}$ , the transformation to $P_{i}$ consists of two steps: (1) a linear transformation for the upper and lower hull of $P_{i-1}$ , and (2) a truncation of the resulting polygon by vertical and horizontal lines (i. e., an intersection of the polygon and a half-plane).

The first step requires a more involved proof and uses that all line segments of $P_{i}$ have weakly positive slope (“ $+\textsc{sloped}$ ”, formally defined below). Implicitly computing the first transformation as we move between $P_{i}$ is straightforward, only requiring a composition of linear operations (a different one, though, for upper and lower hull). We can apply the cumulative transformation whenever we need to access a vertex.

The second step is conceptually simpler, but more difficult to implement efficiently, as we have to determine where a line cuts the polygon in amortized constant time. For this operation, we separately store the vertices of the upper and lower hull of $P_{i}$ in two arrays, sorted by increasing $x$ -coordinate; since $P_{i}$ is $+\textsc{sloped}$ , $y$ -values are also increasing. A linear search for intersections has overall $O(n)$ cost since we can charge individual searches to deleted vertices.

Finally, if $P_{n}\neq\emptyset$ , we compute a feasible vector $\textstyle b$ backwards, starting from any point in $P_{n}$ . Since we do not explicitly store the $P_{i}$ , this requires successively “undoing” all operations (going from $P_{i}$ back to $P_{i-1}$ ); see Appendix C for details.

2.2 Transformation from state $P_{i-1}$ to $P_{i}$

We first define the structural property “ $+\textsc{sloped}$ ” that our method relies on.

Definition 2.1 ( $+\textsc{sloped}$ )

We say a polygon $P\subseteq\mathbb{R}^{2}$ with vertices $v_{1},\ldots,v_{k}$ is $+\textsc{sloped}$ if $\mathrm{slope}(v_{i},v_{j})\geq 0$ for all edges $(v_{i},v_{j})$ of $P$ . Here, the slope between two points $v_{1}=(x_{1},y_{1})$ , $v_{2}=(x_{2},y_{2})\in\mathbb{R}^{2}$ is defined as $\mathrm{slope}(v_{1},v_{2})=\frac{y_{2}-y_{1}}{x_{2}-x_{1}}$ , when $x_{1}\neq x_{2}$ , and $\mathrm{slope}(v_{1},v_{2})=\infty$ , otherwise. $\Box$

We will now show that $P_{i}$ can be computed by applying a simple geometric transformation to $P_{i-1}$ . In passing, we will prove (by induction on $i$ ) that all $P_{i}$ are $+\textsc{sloped}$ . For the base case, note that $P_{2}=\{(b_{2},b_{2}-b_{1})\mid x_{1}^{-}\leq b_{1}\leq x_{1}^{+}\wedge x_{2}^{-}\leq b_{2}\leq x_{2}^{+}\wedge y_{2}^{-}\leq b_{2}-b_{1}\leq y_{2}^{+}\}$ , which is an intersection of $6$ half-planes. The slopes of the defining inequalities are all non-negative or infinite, so $P_{2}$ is $+\textsc{sloped}$ .

Let us now assume that $P_{i-1}$ , $i\geq 3$ , is $+\textsc{sloped}$ ; we will consider the transformation from $P_{i-1}$ to $P_{i}$ and show that it preserves this property. We begin by separating the transformation from $P_{i-1}$ to $P_{i}$ into two main steps.

Step 1: Second-order constraint only.

For the first step, we ignore the value and first-order constraints at index $i$ . This will yield a convex polygon, $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ , that contains $P_{i}$ ; in Step 2, we will add the other constraints at $i$ to obtain $P_{i}$ itself.

Definition 2.2 ( $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ : 2nd-order-only polygons)

For a fixed $i$ , consider the modified problem with $x_{i}^{-},y_{i}^{-}=-\infty$ and $x_{i}^{+},y_{i}^{+}=\infty$ . Define the second-order-only polygon, $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ , as the polygon $P_{i}$ of this modified problem (considering only the $z_{i}$ constraints at $i$ ). $\Box$

The statement of the following lemma is very simple observation, but allows us to compute $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ from $P_{i-1}$ with an explicit geometric construction, (whereas such seemed not obvious for the original feasibility polygons).

Lemma 2.3 ( $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ : scaled, sheared and shifted $P_{i-1}$ )

$P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}=\bigl{\{}(x+\alpha_{i}y+z,\alpha_{i}y+z)\mid(x,y)\in P_{i-1},z\in[z_{i}^{-},z_{i}^{+}]\bigr{\}}$ . $\Box$

Proof 2.4

The only constraint at $i$ is $z_{i}^{-}\leq(b_{i}-b_{i-1})-\alpha_{i}(b_{i-1}-b_{i-2})\leq z_{i}^{+}$ . We rewrite this as (a) a constraint for $b_{i}-b_{i-1}$ , using that $b_{i-1}-b_{i-2}$ is the $y$ -coordinate in $P_{i-1}$ , and (b) a constraint for $b_{i}$ , using that, additionally, $b_{i-1}$ is the $x$ -coordinate in $P_{i-1}$ . $\Box$

Once we have computed this polygon $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ , computing $P_{i}$ is easy: adding the constraints $x_{i}^{-}\leq x\leq x_{i}^{+}$ and $y_{i}^{-}\leq y\leq y_{i}^{+}$ requires only cutting $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ with two horizontal and vertical lines. We give a visual representation of the mapping on an example in Figure 2. We break the above mapping into two simpler stages:

Corollary 2.5 ( $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ : sheared and shifted $P_{i-1}^{\alpha_{i}}$ )

Setting $P_{i-1}^{\alpha_{i}}=\{(x,\alpha_{i}y)\mid(x,y)\in P_{i-1}\}$ , we have

$P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}=\bigl{\{}(x+y+z,y+z)\mid(x,y)\in P_{i-1}^{\alpha_{i}},z\in[z_{i}^{-},z_{i}^{+}]\bigr{\}}$ . $\Box$

We note that scaling the $y$ -coordinate by $\alpha_{i}$ preserves the $+\textsc{sloped}$ -property:

Lemma 2.6

Let $\alpha\geq 0$ . If $P$ is $+\textsc{sloped}$ , so is $P^{\alpha}=\{(x,\alpha y)\mid(x,y\in P)\}$ . $\Box$

Proof 2.7

Scaling the $y$ -coordinates will preserve all of the vertices of $P$ , and also scale the slope of each vertex pair by $\alpha\geq 0$ . So, $P^{\alpha}$ is $+\textsc{sloped}$ . $\Box$

That leaves us with the core of the transformation, from $P_{i-1}^{\alpha_{i}}$ to $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ . Intuitively, it can be viewed as sliding $P_{i-1}^{\alpha_{i}}$ along the line $x=y$ by any amount $z\in[z_{i}^{-},z_{i}^{+}]$ and taking the union thereof, (see Figure 2). To compute the result of this operation, we split the boundary into upper and lower hull.

Definition 2.8 (Upper/lower hull)

Let $P$ be a convex polygon with vertex set $V$ . We define the upper hull (vertices) resp. lower hull (vertices) of $P$ as

[TABLE]

Unless specified otherwise, hull vertices are ordered by increasing $x$ -coordinate. $\Box$

Note that a vertex can be in both hulls. Moreover, the leftmost vertices in $\textsc{u-hull}(P)$ and $\textsc{l-hull}(P)$ always have the same $x$ -coordinate, similarly for the rightmost vertices. As proved in Lemma 2.3, each point in $P_{i-1}^{\alpha_{i}}$ is mapped to a line-segment with slope $1$ ; we give this mapping a name.

Definition 2.9 (2nd-order $P$ transform)

Let $f_{i}((x,y))$ be the line-segment $\{(x+y+z,y+z)\mid z\in[z_{i}^{-},z_{i}^{+}]\}$ and denote by $f_{i}^{-}((x,y))=(x+y+z_{i}^{-},y+z_{i}^{-})$ and $f_{i}^{+}((x,y))=(x+y+z_{i}^{+},y+z_{i}^{+})$ the two endpoints of $f_{i}((x,y))$ .

We write $f(S)=\bigcup_{(x,y)\in S}f((x,y))$ for the element-wise application of $f$ to a set $S$ of points. $\Box$

The vertices of $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ result from transforming the upper hull of $P_{i-1}^{\alpha_{i}}$ by $f_{i}^{+}$ and the lower hull by $f_{i}^{-}$ . The next lemma formally establishes that applying $f_{i}^{+}$ resp. $f_{i}^{-}$ to the hulls of $P_{i-1}^{\alpha_{i}}$ correctly computes $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ , (again, compare Figure 2).

Lemma 2.10 (From $P_{i-1}^{\alpha_{i}}$ to $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ via hulls)

If $P_{i-1}^{\alpha_{i}}$ is $+\textsc{sloped}$ , then $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ is $+\textsc{sloped}$ and $\textsc{u-hull}(P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}})=\{f_{i}^{-}(v_{\mathit{ll}})\}\cup f_{i}^{+}(\textsc{u-hull}(P_{i-1}^{\alpha_{i}}))$ and $\textsc{l-hull}(P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}})=f_{i}^{-}(\textsc{l-hull}(P_{i-1}))\cup\{f_{i}^{+}(v_{\mathit{ur}})\}$ , where $v_{\mathit{ll}}$ (lower-left) and $v_{\mathit{ur}}$ (upper-right) are the first vertex of $\textsc{l-hull}(P_{i-1}^{\alpha_{i}})$ and the last vertex of $\textsc{u-hull}(P_{i-1}^{\alpha_{i}})$ , respectively. $\Box$

We defer the formal proof to Appendix B. Intuitively, since each point in $P_{i-1}^{\alpha_{i}}$ is mapped to a line-segment with slope $1$ in $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ , $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ is obtained by sliding $P_{i-1}^{\alpha_{i}}$ along the line $x=y$ . Note here that we could allow $z_{i}^{-}=-\infty$ and/or $z_{i}^{+}=\infty$ , where the functions $f_{i}^{-},f_{i}^{+}$ would instead map to the ray centered at $(x,x+y)$ and either pointed upwards or downwards with slope $1$ . The full transformation from $P_{i-1}$ to $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ can now be stated as:

Lemma 2.11 ( $P_{i-1}$ to $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ )

Let $f_{i}^{*,\alpha_{i}}$ be the function $f_{i}^{*,\alpha_{i}}(x,y)=(x+\alpha_{i}y+z_{i}^{*},\alpha_{i}y+z_{i}^{*})$ for $*\in\{-,+\}$ . If $P_{i-1}$ is $+\textsc{sloped}$ , then $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ is $+\textsc{sloped}$ with

[TABLE]

with $v_{\mathit{ll}}$ and $v_{\mathit{ur}}$ the lower-left resp. upper-right vertex of $P_{i-1}$ . $\Box$

Proof 2.12

This follows immediately from Corollary 2.5 and Lemmas 2.6 and 2.10. $\Box$

Step 2: Truncating by value and slope.

To complete the transformation, we need to add the constraints $x_{i}^{-}\leq b_{i}\leq x_{i}^{+}$ and $y_{i}^{-}\leq b_{i}-b_{i-1}\leq y_{i}^{+}$ to $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ . This is equivalent to cutting our polygon with two vertical and horizontal planes. The following lemma shows that this preserves the $+\textsc{sloped}$ -property.

Lemma 2.13 (# new vertices)

If $P_{i-1}$ is $+\textsc{sloped}$ with $k$ vertices, then $P_{i}$ is either empty or $+\textsc{sloped}$ with at most $k+6$ vertices. $\Box$

It follows that over the course of the algorithm, only $O(n)$ vertices are added in total. This will be instrumental for analyzing the running time.

Proof 2.14

We know that $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ is $+\textsc{sloped}$ , and it follows easily from the definition that cutting by horizontal and vertical planes will preserve this property. Furthermore, note that cutting a convex polygon will increase the total number of vertices by at most one. We added at most 2 vertices to $P_{i-1}$ to obtain $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ . We then cut $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ by the inequalities $x\leq x_{i}^{+}$ , $x\geq x_{i}^{-}$ , $y\leq y_{i}^{-}$ , and $y\geq y_{i}^{+}$ , i. e., two horizontal and vertical planes. Each adds at most one vertex, giving the desired upper bound. $\Box$

2.3 Algorithm

A direct implementation of the transformation of Lemma 2.11 yields a “brute force” algorithm that maintains all vertices of $P_{i}$ and checks if $P_{n}$ is empty; (the running time would be quadratic). It works as follows:

Compute the vertices of $P_{2}$ . 2. 2.

For $i=3,\ldots,n$ , do the following:

2.1.

At step $i$ , scale the $y$ -coordinate of each vertex by $\alpha_{i}$ . 2. 2.2.

Apply $f_{i}^{+}$ resp. $f_{i}^{-}$ to each vertex, depending on which hull it is in. 3. 2.3.

Add the new vertex to u-hull and l-hull, as per Lemma 2.11. 4. 2.4.

Delete all the vertices outside $[x_{i}^{-},x_{i}^{+}]\times[y_{i}^{-},y_{i}^{+}]$ and

add the vertices created by intersecting with $[x_{i}^{-},x_{i}^{+}]\times[y_{i}^{-},y_{i}^{+}]$ . 3. 3.

If $P_{n}\neq\emptyset$ , compute $(b_{1},\ldots,b_{n})$ by backtracing.

Observe that Lemma 2.11 applies the same linear function (multiplication of $y$ -coordinate by $\alpha_{i}$ and $f_{i}^{+}$ or $f_{i}^{-}$ ) to all vertices in u-hull resp. l-hull. So, we do not need to modify every vertex each time; instead, we can store – separately for u-hull and l-hull – the composition of the linear transformations as a matrix. Whenever we access a vertex, we take the unmodified vertex and apply the cumulative transformation in $O(1)$ time.

At each step, after applying the linear transformations, by Lemma 2.11 we also need to copy the leftmost vertex of l-hull, add it to the left of u-hull and copy the rightmost vertex of u-hull and add it to the right of l-hull. To add these vertices, we simply apply the inverse of each respective cumulative transformation such that all stored vertices require the same transformation. This will also take $O(1)$ time.

Since all the slopes of $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ are non-negative ( $+\textsc{sloped}$ ) and we keep vertices sorted by $x$ -coordinate, the truncation by a horizontal or vertical plane can only remove a prefix or suffix from u-hull and l-hull of $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ . Depending on the constraint we are adding, ( $x\leq x_{i}^{+}$ , $x\geq x_{i}^{-}$ , $y\leq y_{i}^{-}$ , or $y\geq y_{i}^{+}$ ), we start at the rightmost or leftmost vertex of the u-hull and l-hull, and continue until we find the intersection with the cutting plane. We remove all visited vertices.

This could take $O(n)$ time in any single iteration, but the total cost over all iterations is $O(n)$ since we start with $O(1)$ vertices and add $O(n)$ vertices throughout the entire procedure (by Lemma 2.13). This allows us to use two deques (double-ended queues), represented as arrays, to store the vertices of u-hull and l-hull. Putting this all together gives the linear time algorithm for the decision problem “ $\mathcal{S}=\emptyset$ ?”.

To compute an actual solution when $\mathcal{S}\neq\emptyset$ , we compute $b_{n},\ldots,b_{1}$ , in this order. From the last $P_{n}$ , we can find a feasible $b_{n}$ (the $x$ -coordinate of any point in $P_{n}$ ). Then, we retrace the steps of our algorithm through specific points in each $P_{i}$ . Since intermediate $P_{i}$ were only implicitly represented, we have to recover $P_{i}$ by “undoing” the algorithm’s operations in reverse order; this is possible in overall time $O(n)$ by remembering the operations from the forward phase. The details on the backtracing step are deferred to Appendix C, where we also present the final algorithm.

3 Conclusion

In this article, we presented a linear-time dynamic-programming algorithm to decide whether there is a vector $\textstyle b$ that lies (componentwise) between given upper and lower bounds and additionally satisfies inequalities on its first- and second-order (successive) differences. This method can be used to approximate weighted- $L_{\infty}$ shape-restricted function-fitting problems, where the shape restrictions are given as bounds on first- and/or second-order differences (local slope and curvature).

This is a first step towards much sought-after efficient methods for more general convex regression tasks. A main limitation of our approach is the restriction to one-dimensional problems. We show in Appendix D that a natural extension of the problem studied here to directed acyclic graphs is already as hard as linear programming, leaving little hope for an efficient generic solution. This is in sharp contrast to isotonic regression, where similar extensions to arbitrary partial orders do have efficient algorithms (for $L_{\infty}$ ) [27]. This might also be bad news for multidimensional regression with second-order constraints, since higher dimensions entail, among other complications, a non-total order over the inputs.

A second limitation is the $L_{\infty}$ error metric, which might not be adequate for all applications. We leave the question whether similarly efficient methods are also possible for other metrics for future work. A further extension to study is convex unimodal regression; here, finding the maximum is part of the fitting problem, and so not directly possible with our presented method.

Acknowledgments

We thank Richard Peng, Sushant Sachdeva, and Danny Sleator for insightful discussions, and our anonymous referees for further relevant references and insightful comments that significantly improved the presentation.

Appendix

Appendix A Simple greedy algorithm for convex regression

In this appendix, we give details on a simpler algorithm for the special case of unweighted convex function fitting.

Theorem A.1

There exists an algorithm for the unweighted $L_{\infty}$ convex regression that runs in $O(n)$ time. $\Box$

Proof A.2

We consider the following problem. Given an $n$ -dimensional vector $\textstyle a$ , and parameter $\Delta\geq 0$ , find a convex vector $\textstyle b$ such that $\left\|b-a\right\|_{\infty}\leq\Delta$ , if such a vector exists.

This clearly fits under our parameters of Definition 1.1 by setting $\mathchoice{\mbox{\boldmath$ \displaystyle x^{-} $}}{\mbox{\boldmath$ \textstyle x^{-} $}}{\mbox{\boldmath$ \scriptstyle x^{-} $}}{\mbox{\boldmath$ \scriptscriptstyle x^{-} $}}=\mathchoice{\mbox{\boldmath$ \displaystyle a $}}{\mbox{\boldmath$ \textstyle a $}}{\mbox{\boldmath$ \scriptstyle a $}}{\mbox{\boldmath$ \scriptscriptstyle a $}}-\Delta,\mathchoice{\mbox{\boldmath$ \displaystyle x^{+} $}}{\mbox{\boldmath$ \textstyle x^{+} $}}{\mbox{\boldmath$ \scriptstyle x^{+} $}}{\mbox{\boldmath$ \scriptscriptstyle x^{+} $}}=\mathchoice{\mbox{\boldmath$ \displaystyle a $}}{\mbox{\boldmath$ \textstyle a $}}{\mbox{\boldmath$ \scriptstyle a $}}{\mbox{\boldmath$ \scriptscriptstyle a $}}+\Delta$ , both $\textstyle y^{-}$ and $\textstyle y^{+}$ to be unbounded, and $\mathchoice{\mbox{\boldmath$ \displaystyle z^{-} $}}{\mbox{\boldmath$ \textstyle z^{-} $}}{\mbox{\boldmath$ \scriptstyle z^{-} $}}{\mbox{\boldmath$ \scriptscriptstyle z^{-} $}}=0,\mathchoice{\mbox{\boldmath$ \displaystyle z^{+} $}}{\mbox{\boldmath$ \textstyle z^{+} $}}{\mbox{\boldmath$ \scriptstyle z^{+} $}}{\mbox{\boldmath$ \scriptscriptstyle z^{+} $}}=\infty$ , along with $\mathchoice{\mbox{\boldmath$ \displaystyle\alpha $}}{\mbox{\boldmath$ \textstyle\alpha $}}{\mbox{\boldmath$ \scriptstyle\alpha $}}{\mbox{\boldmath$ \scriptscriptstyle\alpha $}}=1$ . A binary search on $\Delta$ gives a $O(n\log\frac{U}{\varepsilon})$ algorithm.

However, this can also be solved by considering the set of points $(i,a_{i}+\Delta)$ for all $i$ , and taking the lower hull,222The lower hull of a set of points is the subset of vertices $(x_{i},y_{i})$ of the convex hull, where $y_{i}$ is the minimal $y$ -coordinate of all points with the $x$ -coordinate $x_{i}$ in the convex hull; see also Definition 2.8.

$H(\Delta)$ , such that for each point $(i,h_{i})$ in this lower hull we set $b_{i}=h_{i}$ . We claim that the minimum possible $\Delta$ such that $b_{i}\geq a_{i}-\Delta$ is exactly the answer to this problem. If $(i,a_{i}+\Delta)$ is a vertex of the convex hull, $b_{i}=a_{i}+\Delta$ is always at least $a_{i}-\Delta$ . Otherwise, let $(j,a_{j}+\Delta)$ , $(k,a_{k}+\Delta)$ be two vertices of $H$ such that $j<i<k$ . We have

[TABLE]

If $\Delta$ violates this for some $i,j,k$ , then it is impossible to fit a convex function through the intervals $[(j,a_{j}-\Delta),(j,a_{j}+\Delta)]$ , $[(i,a_{i}-\Delta),(i,a_{i}+\Delta)]$ , and $[(k,a_{k}-\Delta),(k,a_{k}+\Delta)]$ .

Conversely, if $\Delta$ satisfies all of such constraints, $b_{i}\geq a_{i}-\Delta$ for all $1\leq i\leq n$ , then $b_{i}$ cannot be greater than $a_{i}+\Delta$ as that would violate $H$ being the convex lower hull of $(i,a_{i}+\Delta)$ . Thus, $(b_{1},\ldots,b_{n})$ is a possible solution.

It takes $O(n)$ time to compute the lower convex hull and $O(n)$ time to calculate the minimum $\Delta$ . Thus, this algorithm solves $L_{\infty}$ convex regression in $O(n)$ time. $\Box$

The above method can also be adapted for inputs with $x$ -values that are non-uniformly spaced. However, it does not directly generalize to weighted $L_{\infty}$ regression: moving points up by $w_{i}\cdot\Delta$ can lead to different lower hulls for different values of $\Delta$ .

Appendix B Proof of Lemma 2.10

The proof of Lemma 2.10 will be separated into two stages. First, we show that the polygon defined by $\{f_{i}^{+}(\textsc{u-hull}(P_{i-1}^{\alpha_{i}}))\}\cup\{f_{i}^{-}(\textsc{l-hull}(P_{i-1}^{\alpha_{i}}))\}$ has upper-hull $\{f_{i}^{-}(v_{\mathit{ll}}),f_{i}^{+}(\textsc{u-hull}(P_{i-1}^{\alpha_{i}}))\}$ and lower-hull $\{f_{i}^{-}(\textsc{l-hull}(P_{i-1}^{\alpha_{i}})),f_{i}^{+}(v_{\mathit{ur}})\}$ , where $v_{\mathit{ll}}$ is the first vertex of $\textsc{l-hull}(P_{i-1}^{\alpha_{i}})$ and $v_{\mathit{ur}}$ is the last vertex of $\textsc{u-hull}(P_{i-1}^{\alpha_{i}})$ . Furthermore, this polygon will have slopes between vertices in $[0,1]$ . This property will then allow us to show that $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ is equivalent to the convex hull of the vertices, which implies the claim.

In order to show that the $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ has all slopes between [math] and $1$ , we consider how $f_{i}^{-}$ and $f_{i}^{+}$ affect slopes.

Lemma B.1 (Bounded slopes)

If $P$ is $+\textsc{sloped}$ , then for any connected vertices $v_{j},v_{k}\in V$ , any $i$ , and $*\in\{-,+\}$ , we have

[TABLE]

and for any connected vertices $v_{j},v_{k},v_{l}\in V$ , if $\text{slope}(v_{j},v_{k})<\text{slope}(v_{k},v_{l})$ , then

[TABLE]

$\Box$

Proof B.2

We first write the slope function explicitly to obtain

[TABLE]

This implies that if $\text{slope}(v_{j},v_{k})=\infty$ then $\text{slope}(f_{i}^{*}(v_{j}),f_{i}^{*}(v_{k}))=1$ , and if $\text{slope}(v_{j},v_{k})=0$ then $\text{slope}(f_{i}^{*}(v_{j}),f_{i}^{*}(v_{k}))=0$ . Furthermore, this gives the identity

[TABLE]

when $\text{slope}(v_{j},v_{k})\in(0,\infty).$ Combined with the fact that all slopes are non-negative, this gives both of our desired inequalities. $\Box$

The first inequality of the lemma above will allow us to show that all of the slopes between vertices are bounded, and the second implies that each of the vertices remains a vertex, giving the following corollary.

Corollary B.3 (Hulls by elementwise transformation)

If $P_{i-1}^{\alpha_{i}}$ is $+\textsc{sloped}$ , then the convex hull $P$ of $V=f_{i}^{+}(\textsc{u-hull}(P_{i-1}^{\alpha_{i}}))\cup f_{i}^{-}(\textsc{l-hull}(P_{i-1}^{\alpha_{i}}))$ has $\textsc{u-hull}(P)=\{f_{i}^{-}(v_{\mathit{ll}}),f_{i}^{+}(\textsc{u-hull}(P_{i-1}^{\alpha_{i}}))\}$ and $\textsc{l-hull}(P)=\{f_{i}^{-}(\textsc{l-hull}(P_{i-1}^{\alpha_{i}})),f_{i}^{+}(v_{\mathit{ur}})\}$ , where $v_{\mathit{ll}}$ is the first (lower-left) vertex of $\textsc{l-hull}(P_{i-1}^{\alpha_{i}})$ and $v_{\mathit{ur}}$ is the last (upper-right) vertex of $\textsc{u-hull}(P_{i-1}^{\alpha_{i}})$ . Furthermore, for any connected vertices $v_{j},v_{k}$ in $P$ , we have $0\leq\text{slope}(v_{j},v_{k})\leq 1$ . $\Box$

Proof B.4

By construction, the first and last vertices of $\textsc{u-hull}(P)$ and $\textsc{l-hull}(P)$ are the same. Let $v_{u1}$ be the first vertex of $\textsc{u-hull}(P_{i-1}^{\alpha_{i}})$ , which gives two possibilities, either (1): $v_{u1}=v_{\mathit{ll}}$ , or (2) $\text{slope}(v_{u1},v_{\mathit{ll}})=\infty$ . For case (1) it is easy to see that $\text{slope}(f_{i}^{+}(v_{u1}),f_{i}^{-}(v_{\mathit{ll}}))=1$ , and for case (2), we showed in the proof of Lemma B.1 that $\text{slope}(v_{u1},v_{\mathit{ll}})=\infty$ implies $\text{slope}(f_{i}^{+}(v_{u1}),f_{i}^{+}(v_{\mathit{ll}}))=1$ , which combined with $\text{slope}(f_{i}^{+}(v_{\mathit{ll}}),f_{i}^{-}(v_{\mathit{ll}}))=1$ gives $\text{slope}(f_{i}^{+}(v_{u1}),f_{i}^{-}(v_{\mathit{ll}}))=1$ . Furthermore the slopes between all vertices in $\textsc{u-hull}(P_{i-1}^{\alpha_{i}})$ are less than $\infty$ by Definition 2.8, and therefore less than $1$ under the transformation by Lemma B.1. Along with the second inequality of Lemma B.1, this implies that $\textsc{u-hull}(P)$ makes up a concave function from $f_{i}^{-}(v_{\mathit{ll}})$ to $f_{i}^{+}(v_{\mathit{ur}})$ .

By symmetric reasoning we see that $\textsc{l-hull}(P)$ makes up a convex function from $f_{i}^{-}(v_{\mathit{ll}})$ to $f_{i}^{+}(v_{\mathit{ur}})$ . Additionally, the second inequality states that every element in $\{f_{i}^{-}(\textsc{l-hull}(P_{i-1}^{\alpha_{i}})),f_{i}^{+}(v_{\mathit{ur}})\}$ and $\{f_{i}^{-}(v_{\mathit{ll}}),f_{i}^{+}(\textsc{u-hull}(P_{i-1}^{\alpha_{i}}))\}$ must be a vertex. Accordingly, $P$ must be a convex polygon with all slopes between [math] and $1$ . $\Box$

We now have fixed upper and lower hulls of a polygon, and we use the representation as the convex hull its vertices, along with the bounded-slope property, to show that this polygon is in fact equal to $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ . In particular, all the slopes being bounded by $1$ will be critical here because each point $(x,y)\in P_{i-1}^{\alpha_{i}}$ maps to a line segment from $(x+y+z_{i}^{-},y+z_{i}^{-})$ to $(x+y+z_{i}^{+},y+z_{i}^{+})$ , which has slope $1$ . If we then consider $(x,y)$ to be in the upper hull, if the slopes of our new upper-hull for $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ were greater than $1$ , the point $(x+y+z_{i}^{-},y+z_{i}^{-})$ would lie outside of this hull. Our bounded slopes prevent this, though, and lead to the following lemma.

Lemma B.5

Let $P_{i-1}^{\alpha_{i}}$ be $+\textsc{sloped}$ and let $P$ be the convex hull of

[TABLE]

Then $P=P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ . $\Box$

Proof B.6

We show both inclusions.

•

$P\subseteq P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ .

By definition of $P$ , any point $u\in P$ , can be written as a convex combination

[TABLE]

where the sum is over the vertices $(x_{j},y_{j})$ of $P_{i-1}^{\alpha_{i}}$ , $*\in\{-,+\}$ , and $\sum p_{j}=1$ . We set $z=\sum p_{j}z_{i}^{*}$ ; clearly, $z\in[z_{i}^{-},z_{i}^{+}]$ . Furthermore set $x=\sum p_{j}x_{i}$ and $y=\sum p_{j}y_{j}$ . We know each $(x_{j},y_{j})$ is a vertex in $P_{i-1}^{\alpha_{i}}$ , so by convexity $(x,y)$ must be in $P_{i-1}^{\alpha_{i}}$ , implying $(x+y+x,y+z)\in P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ by Corollary 2.5.

•

$P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}\subseteq P$ .

Assume towards a contradiction there were $(x+y+z,y+z)\in P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ with $(x,y)\in P_{i-1}^{\alpha_{i}}$ and $z\in[z_{i}^{-},z_{i}^{+}]$ , but $(x+y+z,y+z)\notin P$ . By definition and assumption, both $P$ and $P_{i-1}^{\alpha_{i}}$ are convex, so there must be a vertex $(x_{v},y_{v})$ of $P_{i-1}^{\alpha_{i}}$ such that $(x_{v}+y_{v}+z,y_{v}+z)\notin P$ . Furthermore, by convexity of $P$ , there must also exist $z\in\{z_{i}^{-},z_{i}^{+}\}$ such that $(x_{v}+y_{v}+z,y_{v}+z)\notin P$ . Assume w. l. o. g. that $(x_{v},y_{v})\in\textsc{u-hull}(P_{i-1}^{\alpha_{i}})$ . By definition of $P$ , we have $(x_{v}+y_{v}+z_{i}^{+},y_{v}+z_{i}^{+})\in P$ , so we must have $z=z_{i}^{-}$ .

Since $P_{i-1}^{\alpha_{i}}$ is $+\textsc{sloped}$ and $f_{i}$ is monotone, $f_{i}^{-}(v_{\mathit{ll}})$ is dominated333 $(x_{1},y_{1})$ is said to dominate $(x_{2},y_{2})$ if $x_{1}\geq x_{2}$ and $y_{1}\geq y_{2}$ .

by $(x_{v}+y_{v}+z_{i}^{-},y_{v}+z_{i}^{-})$ , and similarly, $f_{i}^{+}(v_{\mathit{ur}})$ dominates $(x_{v}+y_{v}+z_{i}^{-},y_{v}+z_{i}^{-})$ . Furthermore, by Corollary B.3 the upper hull lies above the line segment from from $f_{i}^{-}(v_{\mathit{ll}})$ to $(x_{v}+y_{v}+z_{i}^{+},y_{v}+z_{i}^{+})$ and has slope at most 1. But the slope between $(x_{v}+y_{v}+z_{i}^{-},y_{v}+z_{i}^{-})$ and $(x_{v}+y_{v}+z_{i}^{+},y_{v}+z_{i}^{+})$ is exactly $1$ , so $(x_{v}+y_{v}+z_{i}^{-},y_{v}+z_{i}^{-})$ cannot lie above the upper hull.

Finally, $(x_{v}+y_{v}+z_{i}^{-},y_{v}+z_{i}^{-})$ also cannot lie below $\textsc{l-hull}(P)$ because otherwise there would exist $(x_{v},y)\in P_{i-1}^{\alpha_{i}}$ that lies above $(x_{v},y_{v})$ , contradicting $(x_{v},y_{v})$ being in $\textsc{u-hull}(P_{i-1}^{\alpha_{i}})$ . Because the upper hull and lower hull combine to the convex polygon $P$ and because the $x$ -coordinate of $(x_{v}+y_{v}+z_{i}^{-},y_{v}+z_{i}^{-})$ is within the range of $x$ -coordinate of $P$ , we have $(x_{v}+y_{v}+z_{i}^{-},y_{v}+z_{i}^{-})\in P$ , a contradiction.

$\Box$

With this, we finish the proof of our lemma.

Proof B.7 (of Lemma 2.10)

Follows directly from Corollary B.3 and Lemma B.5. $\Box$

Appendix C Complete algorithm

In this appendix, we give detailed pseudocode for our entire algorithm. We also discuss the details on the backtracing step, i. e., computing an actual solution $\mathchoice{\mbox{\boldmath$ \displaystyle b $}}{\mbox{\boldmath$ \textstyle b $}}{\mbox{\boldmath$ \scriptstyle b $}}{\mbox{\boldmath$ \scriptscriptstyle b $}}\in\mathcal{S}$ from the (implicitly represented) feasibility polygons $P_{2},\ldots,P_{n}$ . The final procedure is shown in Algorithm 1.

C.1 Implicitly computing the $P_{i}$

The main ideas have been described in Section 2.3. We represent points in homogeneous coordinates, i. e., $(x,y)$ becomes the column vector $(x,y,1)^{T}$ . That allows our transformation to be represented as a single matrix, and we can compose them by multiplying the matrices. We store the current matrix in Algorithm 1 in $S_{u}$ (for the upper hull) and $S_{v}$ for the lower hull. $u$ and $v$ denote the deques storing the (untransformed) points of u-hull and l-hull in homogeneous coordinates and in sorted order.

To compute $P_{i}$ from $P_{i-1}$ (Step 2), we update the transformation matrices and add the new points to the hull (following Lemma 2.11). After that (line 9), $u$ and $v$ represent $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ . To implement the intersection with the half planes corresponding to the value and first-order constraints at $i$ , we separately cut upper and lower hull with all four boundaries. Since we store upper and lower hull separately, vertical line segments are not explicitly represented in either hull, which requires some care in cutting with horizontal lines. We therefore use the following strategy –it is illustrated on an example in Figure 3: We first cut with the left and right boundaries (the value constraints), then transform our representation temporarily to left and right hulls (lines 17–18), which can easily handle cutting by horizontal line segments. Cutting is always implemented as a linear scan of $u$ resp. $v$ , during which all vertices outside the constraint halfplane are removed. Then we add a new vertex at the intersection of the last segment with the constraint. (We remember the last removed vertex $r$ for doing so.)

C.2 Backtracing

Suppose we have computed $P_{n}$ as described above, and then partially backtraced through a sequence of feasible points. We are now at $(b_{i+1},b_{i+1}-b_{i})$ in $P_{i+1}$ . Since $(b_{i+1},b_{i+1}-b_{i})=(x+y+z,y+z)$ , $z\in[z_{i+1}^{-},z_{i+1}^{+}]$ for some (unknown) $(x,y)=(b_{i},\alpha_{i+1}(b_{i}-b_{i-1}))\in P_{i}$ , we can recover $x=b_{i}$ from $(b_{i+1},b_{i+1}-b_{i})$ by subtracting the two coordinates of $(b_{i+1},b_{i+1}-b_{i})$ . To recover $y$ , suppose we can find $y_{\max}=\max\{y\mid(b_{i},y)\in P_{i}^{\alpha_{i+1}}\}$ efficiently. Since $\{y\mid(b_{i},y)\in P_{i}^{\alpha_{i+1}}\}$ is an interval, the following lemma allow us to find $b_{i}-b_{i-1}$ .

Lemma C.1 (back 1 step)

Let $f_{i+1}(x,y)=\{(x+y+z,y+z)\mid z\in[z_{i+1}^{-},z_{i+1}^{+}]\}$ . Either $(b_{i+1},b_{i+1}-b_{i})\in f_{i+1}((b_{i},y_{\max}))$ or $(b_{i+1},b_{i+1}-b_{i})=(b_{i}+y+z_{i+1}^{-},y+z_{i+1}^{-})$ for some $y<y_{\max}$ . $\Box$

Intuitively, a vertical line segment $L$ inside $P_{i}$ is mapped to a line-segment with slope $1$ in $P_{i+1}$ , because the line segments the points in $L$ are mapped to lie all on the same line (overlapping with each other).

Proof C.2

If $(b_{i+1},b_{i+1}-b_{i})\not\in f_{i+1}(b_{i},y_{\max})$ , by the maximality of $y_{\max}$ , $b_{i+1}-b_{i}<y_{\max}+z_{i+1}^{-}$ . Since there exists $(b_{i},y^{\prime})$ such that $(b_{i+1},b_{i+1}-b_{i})\in f_{i+1}(b_{i},y^{\prime})$ , $(b_{i}+y^{\prime}+z,y^{\prime}+z)=(b_{i+1},b_{i+1}-b_{i})$ for some $z\in[z_{i+1}^{-},z_{i+1}^{+}]$ . Consider $f_{i+1}(b_{i},y+z-z_{i+1}^{-})$ . Then $(b_{i+1},b_{i+1}-b_{i})=(b_{i}+(y^{\prime}+z-z_{i+1}^{-})+z_{i+1}^{-},(y^{\prime}+z-z_{i+1}^{-})+z_{i+1}^{-})$ . Since $b_{i+1}-b_{i}<y_{\max}+z_{i+1}^{-}$ , $y^{\prime}+z-z_{i+1}^{-}<y_{\max}$ . The lemma is proven by letting $y$ be $y^{\prime}+z-z_{i+1}^{-}$ . $\Box$

In the former case of Lemma C.1, we can take $(x,y_{\max})$ as $(b_{i},\alpha_{i+1}(b_{i}-b_{i-1}))$ . In the latter case, we can take $(b_{i},(b_{i+1}-b_{i})-z_{i+1}^{-})$ as $(b_{i},\alpha_{i+1}(b_{i}-b_{i-1}))$ .

Since $y_{\max}$ is the $y$ -coordinate of the intersection of $\textsc{u-hull}(P_{i})$ and the vertical line $(b_{i},\cdot)$ , to compute $y_{\max}$ , we want to find two vertices in $\textsc{u-hull}(P_{i})$ , $(x_{l},y_{l})$ and $(x_{r},y_{r})$ , such that $x_{l}\leq b_{i}\leq x_{r}$ . $(b_{i},y_{\max})$ is just the intersection of the line segment between $(x_{l},y_{l})$ and $(x_{r},y_{r})$ and the vertical line $(b_{i},\cdot)$ . The following lemma shows how to find $(x_{l},y_{l})$ and $(x_{r},y_{r})$ efficiently using an amortized constant-time algorithm.

Lemma C.3 (Computing $y_{\max}$ )

Suppose $(x_{l},y_{l})$ and $(x_{r},y_{r})$ are two vertices in $\textsc{u-hull}(P_{i})$ , and some point $(b_{i},y)\in P_{i}$ satisfies $x_{l}\leq b_{i}\leq x_{r}$ . Let $(x^{\prime},y^{\prime})$ be some point in $P_{i+1}$ with $(x^{\prime},y^{\prime})\in f_{i+1}(b_{i},\alpha_{i+1}y)$ . Then $x^{\prime}\leq(f_{i+1}^{+}(x_{r},\alpha_{i+1}y_{r}))_{x}$ , where $(\cdot)_{x}$ means taking the $x$ -coordinate of a point and $(\cdot)_{y}$ takes the $y$ -coordinate. $\Box$

Proof C.4

Assume towards a contradiction that $x^{\prime}>(f_{i+1}^{+}(x_{r},\alpha_{i+1}y_{r}))_{x}$ . Since $x^{\prime}-y^{\prime}=b_{i}\leq x_{r}=(f_{i+1}^{+}(x_{r},\alpha_{i+1}y_{r}))_{x}-(f_{i+1}^{+}(x_{r},\alpha_{i+1}y_{r}))_{y}$ , we have $y^{\prime}>(f_{i+1}^{+}(x_{r},\alpha_{i+1}y_{r}))_{y}$ . But $y^{\prime}=\alpha_{i+1}y+k\leq\alpha_{i+1}y+z_{i+1}^{+}\leq\alpha_{i+1}y_{r}+z_{i+1}^{+}=(f_{i+1}^{+}(x_{r},\alpha_{i+1}y_{r}))_{y}$ . Contradiction. $\Box$

The amortized constant-time algorithm to retrieve $(b_{n},\ldots,b_{1})$ depends on the implementation of the deques. Since we will add $n$ vertices to the deques during the whole algorithm, the (textbook) fixed-size array-based implementation suffices; we recall it to fix notation. A deque $d$ is represented by array $A$ and two indices $p_{l}$ , $p_{r}$ . $p_{l}$ is the index of the first element of $d$ and $p_{r}$ is the index of the last element. If we want to add an element $e$ to the left of the deque, the two operations $p_{l}\leftarrow p_{l}-1$ , $A[p_{l}]=e$ suffice. Similarly, we can add/pop elements from left/right. During our algorithm, $p_{l}$ (resp. $p_{r}$ ) can move to the left (resp. right) by at most $n$ positions, so $A$ can be an array of length $2n+O(1)$ . If we store the vertices of $P_{2}$ in the middle of $A$ initially, we never exceed the boundaries of $A$ when running the algorithm.

Definition C.5 (Position)

We define $\mathit{pos}_{i}(x^{\prime})$ as the smallest index (in the array representing deque $u$ ) of a vertex of $\textsc{u-hull}(P_{i}(\cdot))$ with $x$ -coordinate at least $x^{\prime}$ . $\Box$

Note that adding or removing elements does not change the vertex at a given index (unless that vertex itself is removed).

Lemma C.6 (Monotonicity of positions)

$\mathit{pos}_{i}(b_{i})\geq\mathit{pos}_{i+1}(x^{\prime})$ for some $(x^{\prime},y^{\prime})\in f_{i+1}(b_{i},\alpha_{i+1}y)$ . $\Box$

Proof C.7

By Lemma C.3, $x^{\prime}\leq(f_{i+1}^{+}(x_{r},\alpha_{i+1}y_{r}))_{x}$ . So $f_{i+1}^{+}(x_{r},\alpha_{i+1}y_{r})$ is stored after $pos_{i+1}(x^{\prime})$ . And since our algorithm stores $f_{i+1}^{+}(x_{r},\alpha_{i+1}y_{r})$ at the same place as $(x_{r},y_{r})$ , $pos_{i+1}(x^{\prime})\leq pos_{i}(b_{i})$ . $\Box$

Lemma C.6 allows us to find $\mathit{pos}_{i}(z)$ by moving a pointer monotonically to the right. Thus, we can retrieve $b_{n},\ldots,b_{1}$ in order by unrolling our linear algorithm for the decision problem and moving the pointer $\mathit{pos}_{i}(z)$ . This process takes $O(n)$ time overall.

C.3 Analysis

We conclude with the proof of our main theorem.

Proof C.8 (of Theorem 1.1)

The correctness of Algorithm 1 follows from the preceding discussions: By Lemma 2.11, the iterative transformations compute the $P_{i}$ as defined in (2), and $\mathcal{S}\neq\emptyset$ iff $P_{n}\neq\emptyset$ . Moreover, Lemma C.1 shows that, when $\mathcal{S}\neq\emptyset$ , Step 3 computes a valid $\mathchoice{\mbox{\boldmath$ \displaystyle b $}}{\mbox{\boldmath$ \textstyle b $}}{\mbox{\boldmath$ \scriptstyle b $}}{\mbox{\boldmath$ \scriptscriptstyle b $}}\in\mathcal{S}$ . It remains to analyze the running time.

•

Step 1 takes $O(1)$ time since the vertices of $P_{2}$ are a subset of the (at most) 12 intersection points of the defining lines. ( $P_{2}$ is the trapezoid spanned by $(x_{2}^{-},x_{2}^{-}-x_{1}^{+}),(x_{2}^{-},x_{2}^{-}-x_{1}^{-}),(x_{2}^{+},x_{2}^{+}-x_{1}^{+}),(x_{2}^{+},x_{2}^{+}-x_{1}^{-})$ , intersected with the halfspaces $y\geq y_{2}^{-}$ and $y\leq y_{2}^{+}$ .)

•

Step 2. The operations inside the loops are all constant-time and the outer loop runs $O(n)$ times. Moreover, the inner while-loops all remove a node from a deque, so their total cost over all iterations of the for-loop is $O(n)$ , too: We start with $O(1)$ vertices and adding at most $O(n)$ vertices throughout the entire procedure (Lemma 2.13), so we cannot remove more than $O(n)$ vertices.

•

Step 3. All operations except for the first line inside the for-loop take constant time. The inner while-loop runs for overall $O(n)$ iterations, since $p$ only moves right and we add $O(n)$ vertices in total.

It remains to implement the first line of the loop body in $O(n)$ overall time. To be able to undo the changes to $u$ , $v$ , $S_{u}$ , $S_{v}$ , we keep a log for each instruction executed in Step 2, so that we can undo their changes here (in the opposite order). Since Step 2 runs in $O(n)$ total time, the rollback also runs in $O(n)$ time.

Since all three steps run in linear time, so does the whole algorithm. $\Box$

Appendix D Generalization to DAGs is hard

In this appendix, we will give a natural generalization of Definition 1.1 to arbitrary DAGs and investigate its complexity. Our original setting with differences of adjacent indices only corresponds to a directed-path graph.

In light of rather general results for isotonic regression, the path setting might appear quite restrictive; we will argue here why these conditions probably cannot be relaxed much further if we want an $O(n)$ time algorithm.

Definition D.1

Suppose we are given a directed acyclic graph $G=(V,E)$ with $m=|E|$ edges and $m_{\boldsymbol{\text{p}}}$ number of length two directed paths in $G$ , $n$ -dimensional vectors $x^{-}\leq x^{+}$ , $m$ dimensional vector $y^{-}\leq y^{+}$ , and $m_{\boldsymbol{\text{p}}}$ dimensional vectors $z^{-}\leq z^{+}$ and $\alpha\geq 0$ . We define $\mathcal{S}_{G}$ to be the set of all $n$ -dimensional vectors $b$ such that $x_{i}^{-}\leq b_{i}\leq x_{i}^{+}$ for all $i$ , $y_{ij}^{-}\leq b_{j}-b_{i}\leq y_{ij}^{+}$ for all edges $(i,j)\in E$ , and $z_{ijk}^{-}\leq(b_{k}-b_{j})-\alpha_{ijk}(b_{j}-b_{i})\leq z_{ijk}^{+}$ for all pairs of edges $(i,j),(j,k)\in E.$ $\Box$

In contrast to Theorem 1.1, we show that determining if $\mathcal{S}_{G}$ if empty or not is as hard as solving linear programs.

Theorem D.2

With notation as in Definition D.1, if we can determine $\mathcal{S}_{G}$ is empty or not in time $f(n+m+m_{\boldsymbol{\text{p}}})$ , then we can determine feasibility of any set of linear constraints defined by $s$ bounded integer coefficients in $c_{1}f(c_{2}s\log M))$ time, where $c_{1}$ and $c_{2}$ are two constants and the absolute value of each coefficient in the linear constraints is no more than $M$ . $\Box$

Our reduction to prove Theorem D.2 is closely motivated by the hardness of isotropic total variation from [20], as well as subsequent works on extending such hardness results to positive linear programs. Compared to these results though, it sidesteps linear systems, and is a more direct invocation of the completeness of 2-commodity flow linear programs from [15].

We first consider a more restricted class of problems than Definition D.1 allows (where all the $\alpha$ ’s in Definition D.1 are set to be $1$ ). Formally we define the problem as:

Definition D.3

A generalized second-order constrained feasibility problem is defined by variables $b_{1}\ldots b_{n}$ , combined with a set of $m$ constraints parameterized by

Upper and lower bounds on the variables $x_{i}^{-}$ and $x_{i}^{+}.$ 2. 2.

Upper and lower bounds on the first order differences $y_{i}^{-}$ and $y_{i}^{+}$ and corresponding indices $p_{i}<q_{i}.$ 3. 3.

Upper and lower bounds on the second order differences $z_{i}^{-}$ and $z_{i}^{+}$ and corresponding indices $r_{i}<s_{i}<t_{i}$

and constraints

Value Constraints:

$x_{i}^{-}\leq b_{i}\leq x_{i}^{+}$

First Order Constraints:

$y_{i}^{-}\leq b_{q_{i}}-b_{p_{i}}\leq y_{i}^{+}$

Second Order Constraints:

$z_{i}^{-}\leq\left(b_{t_{i}}-b_{s_{i}}\right)-\left(b_{s_{i}}-b_{r_{i}}\right)\leq z_{i}^{+}.$

The goal is to decide whether there exists $b_{1},\ldots,b_{n}$ that satisfy all these constraints simultaneously. $\Box$

Proof D.4 (of Theorem D.2)

It is easy to see that the problem defined in Definition D.3 is a special case of the problem in Definition D.1. This is obtained by forming a DAG with edges $(p_{i},q_{i})$ , $(r_{i},s_{i})$ , $(s_{i},t_{i})$ for all $p_{i}$ , $q_{i}$ , $r_{i}$ , $s_{i}$ , $t_{i}$ . We will prove that a general linear programming feasibility problem with $s$ polynomially-bounded integer coefficients can be expressed as a second-order-constrained feasibility problem (Definition D.3). In particular, we will show that a feasibility of a set of linear constraints containing at most $s$ non-zero coefficients whose absolute values are integers no more than $M$ can be reduced to $O(s\log M)$ value, first order and second order constraints as in Definition D.3.

Note that the second constraint in Definition D.3 is the same as

[TABLE]

In particular, it allows us to create constraints of the form

[TABLE]

We will now show how we can restate a feasibility of a set of general linear constraints can be expressed as a second order constrained feasibility problem as in Definition D.1. The main idea will be clear when we consider a linear constraint of the form

[TABLE]

with $k$ a power of $2$ , and $i_{1}<i_{2}<\ldots<i_{k}$ in increasing order. To express this in terms of second order constraints, we can introduce new variables

[TABLE]

and use $b_{i_{12}}$ to represent the sum of $b_{i_{1}}$ and $b_{i_{2}}$ and so on. Repeating this halves the value of $k$ , but aggregates the whole sum into a single variable. Therefore, we can express the above linear constraint as one value constraint

[TABLE]

and $k-1$ second order constraints

[TABLE]

In case $k$ is not a power of $2$ , we can add dummy variables whose values we restrict to zero using the value constraints. This process uses at most $k$ value constraints. So we have shown that we can express any linear constraint of the form $b_{i_{1}}+b_{i_{2}}+\ldots b_{i_{k}}\leq c_{i}$ in terms of $O(k)$ second order constraints and $O(k)$ value constraints.

Now consider the case with both positive and negative values in the linear constraint

[TABLE]

We can aggregate the sums of the variables with positive coefficients and negative coefficients separately, and let us denote the resulting variables by $b_{\text{pos}},b_{\text{neg}}.$ We can now bound the difference using a first order constraint of the form

[TABLE]

This results in additional $O(1)$ first order constraints for each linear constraint.

Finally, when the coefficients are arbitrary integers, we can do pairing based on the binary representation. The second order constraint and value constraint allows us to create constrains of the form

[TABLE]

which are equivalent to

[TABLE]

So we can introduce new variables $d_{kj}$ representing $2^{k}b_{j}$ for any $1\leq k\leq c$ where $c$ is a constant. Thus, given any linear constraint in $k$ variables with integer coefficients that are bounded by $M$ , we first represent each coefficient by its binary representation, increasing the number of non-zero coefficients by $O(\log M)$ times and creating $O(k\log M)$ second order constraints and value constraints. Then all the coefficients in the linear constraints are $+1$ or $-1$ and we can use the reduction above. In summary, we can solve any linear programming feasibility problem with $O(s)$ non-zero coefficients which are integers bounded by $M$ by a generalized second-order constrained feasibility problem of $O(s\log M)$ constraints. This together with our assumption of an algorithm solving generalized second-order constrained feasibility problem in $f(\cdot)$ time prove the theorem. $\Box$

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Pankaj K. Agarwal, Jeff M. Phillips, and Bardia Sadri. Lipschitz unimodal and isotonic regression on paths and trees. In LATIN 2010: Theoretical Informatics , pages 384–396. Springer Berlin Heidelberg, 2010. doi:10.1007/978-3-642-12200-2\_34 . · doi ↗
2[2] Alok Aggarwal, Maria M. Klawe, Shlomo Moran, Peter Shor, and Robert Wilber. Geometric applications of a matrix-searching algorithm. Algorithmica , 2(1-4):195–208, November 1987. doi:10.1007/bf 01840359 . · doi ↗
3[3] Francis Bach. Efficient algorithms for non-convex isotonic regression through submodular optimization. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31 , pages 1–10. Curran Associates, Inc., 2018.
4[4] Gábor Balázs. Convex Regression: Theory, Practice, and Applications . Ph D thesis, 2016. doi:10.7939/R 3T 43J 98B . · doi ↗
5[5] Bernard Chazelle. A theorem on polygon cutting with applications. In Symposium on Foundations of Computer Science (SFCS) , pages 339–349. IEEE, 1982. doi:10.1109/SFCS.1982.58 . · doi ↗
6[6] D. Eppstein, Z. Galil, and R. Giancarlo. Speeding up dynamic programming. In Symposium on Foundations of Computer Science (SFCS) . IEEE, 1988. doi:10.1109/sfcs.1988.21965 . · doi ↗
7[7] Jeff Erickson. Shortest homotopic paths, 2009. Lecture notes for computational topology. URL: http://jeffe.cs.illinois.edu/teaching/comptop/2009/notes/shortest-homotopic-paths.pdf .
8[8] C. Fefferman. Smooth interpolation of data by efficient algorithms. In Excursions in Harmonic Analysis, Volume 1 , pages 71–84. Birkhäuser Boston, November 2012. doi:10.1007/978-0-8176-8376-4\_4 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Efficient Second-Order Shape-Constrained Function Fitting††thanks: The first author is supported in part by

Abstract

1 Introduction

Problem definition.

Contributions.

Related work.

1.1 Results

Definition \thetheorem (1st/2nd-diff-constrained vectors):

Theorem 1.1 (1st/2nd-diff-constrained decision)

Definition 1.2 (ε\varepsilonε-approximation)

Theorem 1.3 (Main result)

Proof 1.4

2 First- and second-order difference-constrained vectors

2.1 Overview of the algorithm

2.2 Transformation from state Pi−1P_{i-1}Pi−1​ to PiP_{i}Pi​

Definition 2.1 (+\textscsloped+\textsc{sloped}+\textscsloped)

Step 1: Second-order constraint only.

Definition 2.2 (Pig(z)P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}Pig(z)​: 2nd-order-only polygons)

Lemma 2.3 (Pig(z)P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}Pig(z)​: scaled, sheared and shifted Pi−1P_{i-1}Pi−1​)

Proof 2.4

Corollary 2.5 (Pig(z)P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}Pig(z)​: sheared and shifted Pi−1αiP_{i-1}^{\alpha_{i}}Pi−1αi​​)

Lemma 2.6

Proof 2.7

Definition 2.8 (Upper/lower hull)

Definition 2.9 (2nd-order PPP transform)

Lemma 2.10 (From Pi−1αiP_{i-1}^{\alpha_{i}}Pi−1αi​​ to Pig(z)P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}Pig(z)​ via hulls)

Lemma 2.11 (Pi−1P_{i-1}Pi−1​ to Pig(z)P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}Pig(z)​)

Proof 2.12

Step 2: Truncating by value and slope.

Lemma 2.13 (# new vertices)

Proof 2.14

2.3 Algorithm

3 Conclusion

Acknowledgments

Appendix

Appendix A Simple greedy algorithm for convex regression

Theorem A.1

Proof A.2

Appendix B Proof of Lemma 2.10

Lemma B.1 (Bounded slopes)

Proof B.2

Corollary B.3 (Hulls by elementwise transformation)

Proof B.4

Lemma B.5

Proof B.6

Proof B.7 (of Lemma 2.10)

Appendix C Complete algorithm

C.1 Implicitly computing the PiP_{i}Pi​

C.2 Backtracing

Lemma C.1 (back 1 step)

Proof C.2

Lemma C.3 (Computing ymax⁡y_{\max}ymax​)

Proof C.4

Definition C.5 (Position)

Lemma C.6 (Monotonicity of positions)

Proof C.7

C.3 Analysis

Proof C.8 (of Theorem 1.1)

Appendix D Generalization to DAGs is hard

Definition D.1

Theorem D.2

Definition D.3

Proof D.4 (of Theorem D.2)

Definition 1.2 ( $\varepsilon$ -approximation)

2.2 Transformation from state $P_{i-1}$ to $P_{i}$

Definition 2.1 ( $+\textsc{sloped}$ )

Definition 2.2 ( $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ : 2nd-order-only polygons)

Lemma 2.3 ( $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ : scaled, sheared and shifted $P_{i-1}$ )

Corollary 2.5 ( $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ : sheared and shifted $P_{i-1}^{\alpha_{i}}$ )

Definition 2.9 (2nd-order $P$ transform)

Lemma 2.10 (From $P_{i-1}^{\alpha_{i}}$ to $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ via hulls)

Lemma 2.11 ( $P_{i-1}$ to $P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}$ )

C.1 Implicitly computing the $P_{i}$

Lemma C.3 (Computing $y_{\max}$ )