Efficient Second-Order Shape-Constrained Function Fitting
David Durfee, Yu Gao, Anup B. Rao, Sebastian Wild

TL;DR
This paper introduces efficient algorithms for fitting one-dimensional shape-constrained functions, including monotonicity and convexity, with near-linear time complexity, advancing the computational methods for such problems.
Contribution
The paper presents the first near-linear-time algorithms for second-order shape-constrained function fitting, applicable to a broad class of shape constraints and utilizing a novel geometric interpretation.
Findings
Algorithms achieve $O(n ext{ log } rac{U}{ ext{error}})$ time for general shape constraints.
A simple $O(n)$ greedy algorithm for unweighted convex regression.
Generalization to DAGs is as hard as linear programming.
Abstract
We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted- norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algorithm computes an approximation with additive error in time, where captures the range of input values. We also give a simple greedy algorithm that runs in time for the special case of unweighted convex regression. These are the first (near-)linear-time algorithms for second-order-constrained function fitting. To achieve these results, we use a novel geometric interpretation of the underlying dynamic programming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\clearscrheadfoot\ohead\pagemark\ihead\headmark\addtokomafont
caption \addtokomafontcaptionlabel \setcapmargin2em
Theorem[section]
\manualmark\markleftEfficient Second-Order Shape-Constrained Function Fitting\automark*[section]
Efficient Second-Order Shape-Constrained Function Fitting††thanks: The first author is supported in part by
National Science Foundation Grant 1718533. The last author is supported by the Natural Sciences and Engineering Research Council of Canada and the Canada Research Chairs Programme.
David Durfee Georgia Institute of Technology {ddurfee,ygao380} @ gatech.edu
Yu Gao
Anup B. Rao Adobe Research anuprao @ adobe.com
Sebastian Wild University of Waterloo wild @ uwaterloo.ca
Abstract
We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted- norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algorithm computes an approximation with additive error in time, where captures the range of input values. We also give a simple greedy algorithm that runs in time for the special case of unweighted convex regression. These are the first (near-)linear-time algorithms for second-order-constrained function fitting. To achieve these results, we use a novel geometric interpretation of the underlying dynamic programming problem. We further show that a generalization of the corresponding problems to directed acyclic graphs (DAGs) is as difficult as linear programming.
1 Introduction
We consider the fundamental problem of finding a function that approximates a given set of data points in the plane with smallest possible error, i. e., shall be close to (formalized below), subject to shape constraints on the allowable functions , such as being increasing and/or concave. More specifically, we present a new algorithm that can handle arbitrary constraints on the (discrete) first- and second-order derivatives of .
When we only require to be weakly increasing, the problem is known as isotonic regression, a classic problem in statistics; (see, e. g., [13] for history and applications). It has more recently also found uses in machine learning [17, 16, 12].
In certain applications, further shape restrictions are integral part of the model: For example, microeconomic theory suggests that production functions are weakly increasing and concave (modeling diminishing marginal returns); similar reasoning applies to utility functions. Restricting to functions with bounded derivative (Lipschitz-continuous functions) is desirable to avoid overfitting [16]. All these shape restrictions can be expressed by inequalities for first and second derivatives of ; their discretized equivalents are hence amenable to our new method. Shape restrictions that we cannot directly handle are studied in [28] ( is piecewise constant and the number of breakpoints is to be minimized) and [26] (unimodal ). For a more comprehensive survey of shape-constrained function-fitting problems and their applications, see [14, §1]. Motivated by these applications, the problems have been studied in statistics (as a form of nonparametric regression), investigating, e. g., their consistency as estimators and their rate of convergence [13, 14, 4].
While fast algorithms for isotonic-regression variants have been designed [27], both [22] and [3] list shape constraints beyond monotonicity as important challenges. For example, fitting (multidimensional) convex functions is mostly done via quadratic or linear programming solvers [24]. In his PhD thesis, Balázs writes that current “methods are computationally too expensive for practical use, [so] their analysis is used for the design of a heuristic training algorithm which is empirically evaluated” [4, p. 1].
This lack of efficient algorithms motivated the present work. Despite a few limitations discussed below (implying that we do not yet solve Balázs’ problem), we give the first near-linear-time algorithms for any function-fitting problem with second-order shape constraints (such as convexity). We use dynamic programming (DP) with a novel geometric encoding for the “states”. Simpler versions of such geometric DP variants were used for isotonic regression [25] and are well-known in the competitive programming community; incorporating second-order constraints efficiently is our main innovation.
Problem definition.
Given the vectors \mathchoice{\mbox{\boldmath\displaystyle x}}{\mbox{\boldmath\textstyle x}}{\mbox{\boldmath\scriptstyle x}}{\mbox{\boldmath\scriptscriptstyle x}}=(x_{1},\ldots,x_{n})\in\mathbb{R}^{n} and \mathchoice{\mbox{\boldmath\displaystyle y}}{\mbox{\boldmath\textstyle y}}{\mbox{\boldmath\scriptstyle y}}{\mbox{\boldmath\scriptscriptstyle y}}\in\mathbb{R}^{n}, an error norm and shape constraints (formalized below), compute \mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}}=(f_{1},\ldots,f_{n}) satisfying the shape constraints with minimal d(\mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}},\mathchoice{\mbox{\boldmath\displaystyle y}}{\mbox{\boldmath\textstyle y}}{\mbox{\boldmath\scriptstyle y}}{\mbox{\boldmath\scriptscriptstyle y}}), i. e., we represent via its values at the given points. is usually an norm, d(\mathchoice{\mbox{\boldmath\displaystyle x}}{\mbox{\boldmath\textstyle x}}{\mbox{\boldmath\scriptstyle x}}{\mbox{\boldmath\scriptscriptstyle x}},\mathchoice{\mbox{\boldmath\displaystyle y}}{\mbox{\boldmath\textstyle y}}{\mbox{\boldmath\scriptstyle y}}{\mbox{\boldmath\scriptscriptstyle y}})=\bigl{(}\sum_{i}|x_{i}-y_{i}|^{p}\bigr{)}{}^{1/p}; least squares () dominate in statistics, but more general error functions have been studied for isotonic regression [23, 19, 22, 3]. We will consider the weighted norm, i. e., d(\mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}},\mathchoice{\mbox{\boldmath\displaystyle y}}{\mbox{\boldmath\textstyle y}}{\mbox{\boldmath\scriptstyle y}}{\mbox{\boldmath\scriptscriptstyle y}})=\max_{i\in[n]}w_{i}\mathopen{|}f_{i}-y_{i}\mathclose{|}, where and \mathchoice{\mbox{\boldmath\displaystyle w}}{\mbox{\boldmath\textstyle w}}{\mbox{\boldmath\scriptstyle w}}{\mbox{\boldmath\scriptscriptstyle w}}\in\mathbb{R}_{\geq 0}^{n} is a given vector of weights.
Since we are dealing with discretized functions (a vector ), restrictions for derivatives and have to be discretized, as well. We define local slope and curvature as
[TABLE]
the shape constraints are then given in the form of vectors \mathchoice{\mbox{\boldmath\displaystyle f^{\prime-}}}{\mbox{\boldmath\textstyle f^{\prime-}}}{\mbox{\boldmath\scriptstyle f^{\prime-}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime-}}},\mathchoice{\mbox{\boldmath\displaystyle f^{\prime+}}}{\mbox{\boldmath\textstyle f^{\prime+}}}{\mbox{\boldmath\scriptstyle f^{\prime+}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime+}}},\mathchoice{\mbox{\boldmath\displaystyle f^{\prime\prime-}}}{\mbox{\boldmath\textstyle f^{\prime\prime-}}}{\mbox{\boldmath\scriptstyle f^{\prime\prime-}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime\prime-}}},\mathchoice{\mbox{\boldmath\displaystyle f^{\prime\prime+}}}{\mbox{\boldmath\textstyle f^{\prime\prime+}}}{\mbox{\boldmath\scriptstyle f^{\prime\prime+}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime\prime+}}} of bounds for the first- and second-order differences, i. e., we define the set of feasible answers as F=\bigl{\{}\mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}}\in\mathbb{R}^{n}\bigm{|}\mathchoice{\mbox{\boldmath\displaystyle f^{\prime-}}}{\mbox{\boldmath\textstyle f^{\prime-}}}{\mbox{\boldmath\scriptstyle f^{\prime-}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime-}}}\leq\mathchoice{\mbox{\boldmath\displaystyle f^{\prime}}}{\mbox{\boldmath\textstyle f^{\prime}}}{\mbox{\boldmath\scriptstyle f^{\prime}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime}}}\leq\mathchoice{\mbox{\boldmath\displaystyle f^{\prime+}}}{\mbox{\boldmath\textstyle f^{\prime+}}}{\mbox{\boldmath\scriptstyle f^{\prime+}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime+}}}\mathchoice{\mathbin{\>{\wedge}\>}}{\mathbin{\wedge}}{\mathbin{\wedge}}{\mathbin{\wedge}}\mathchoice{\mbox{\boldmath\displaystyle f^{\prime\prime-}}}{\mbox{\boldmath\textstyle f^{\prime\prime-}}}{\mbox{\boldmath\scriptstyle f^{\prime\prime-}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime\prime-}}}\leq\mathchoice{\mbox{\boldmath\displaystyle f^{\prime\prime}}}{\mbox{\boldmath\textstyle f^{\prime\prime}}}{\mbox{\boldmath\scriptstyle f^{\prime\prime}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime\prime}}}\leq\mathchoice{\mbox{\boldmath\displaystyle f^{\prime\prime+}}}{\mbox{\boldmath\textstyle f^{\prime\prime+}}}{\mbox{\boldmath\scriptstyle f^{\prime\prime+}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime\prime+}}}\bigr{\}} where inequalities on vectors mean the inequality on all components. The weighted- function-fitting problem with second-order shape constraints is then to find
[TABLE]
Often, we only need a lower resp. upper bound; we can achieve that by allowing and entries in and . For example, setting \mathchoice{\mbox{\boldmath\displaystyle f^{\prime\prime-}}}{\mbox{\boldmath\textstyle f^{\prime\prime-}}}{\mbox{\boldmath\scriptstyle f^{\prime\prime-}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime\prime-}}}=0, \mathchoice{\mbox{\boldmath\displaystyle f^{\prime-}}}{\mbox{\boldmath\textstyle f^{\prime-}}}{\mbox{\boldmath\scriptstyle f^{\prime-}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime-}}}=\mathchoice{\mbox{\boldmath\displaystyle f^{\prime\prime-}}}{\mbox{\boldmath\textstyle f^{\prime\prime-}}}{\mbox{\boldmath\scriptstyle f^{\prime\prime-}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime\prime-}}}=+\infty and \mathchoice{\mbox{\boldmath\displaystyle f^{\prime}-}}{\mbox{\boldmath\textstyle f^{\prime}-}}{\mbox{\boldmath\scriptstyle f^{\prime}-}}{\mbox{\boldmath\scriptscriptstyle f^{\prime}-}}=-\infty, we can enforce a convex function/vector. We also consider the decision-version of the problem: given a bound , decide if there is an \mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}}\in F with , and if so, report one.
Contributions.
Our main result is a single -time algorithm for the decision problem of function fitting with second-order constraints; see Theorem 1.1 for the precise statement. With binary search, this readily yields an additive -approximation for (1), and thus weighted isotonic regression, convex regression and Lipschitz convex regression, in O\bigl{(}n\log\frac{U}{\varepsilon}\bigr{)} time (Theorem 1.3), where . In the appendix, we give a simple greedy algorithm (see Theorem A.1) for unweighted (\mathchoice{\mbox{\boldmath\displaystyle w}}{\mbox{\boldmath\textstyle w}}{\mbox{\boldmath\scriptstyle w}}{\mbox{\boldmath\scriptscriptstyle w}}=1) convex regression that runs in time. Finally, we show that a generalization of the problem to DAGs (where the applied first- and second-order difference constraints are restricted by the graph), is as hard as linear programming, see Appendix D.
Related work.
Stout [27] surveys algorithms for various versions of isotonic regression; they achieve near-linear or even linear time for many error metrics. He also considers the generalization to any partial order (instead of the total order corresponding to weakly increasing functions). A related task is to fit a piecewise-constant function (with a prescribed number of jumps) to given data. [9, 10] solve this problem for in optimal time. Since the geometric constraints are much easier than in our case, a simple greedy algorithm suffices to solve the decision version.
For more restricted shapes, less is known. [26] gives a solution for unimodal regression. [1] gives an algorithm for unweighted Lipschitz isotonic regression and a time algorithm for Lipschitz unimodal regression. [24] describes (multidimensional) convex regression algorithms based quadratic programming. Fefferman [8] studied a closely related problem of smooth interpolation of data in Euclidean space minimizing a certain norm defined on the derivatives of the function. His setup is much more general, but his algorithm cannot find arbitrarily good interpolations ( is fixed for the algorithm). All fast algorithms above consider classes defined by constraints on the first derivative only, not the second derivative as needed for convexity. To our knowledge, the fastest prior solution for any convex regression problem is solving a linear program, which will imply super-linear time.
We use a geometric interpretation of dynamic-programming states and represent them implicitly. The work closest in spirit to ours is a recent article by Rote [25]; establishing the transformation of states is much more complicated in the presently studied problem, though. Implicitly representing a series of more complicated objects using data structures has been used in geometric and graph algorithms, such as multiple-source shortest paths [18] and shortest paths in polygons [5, 21, 7]. The only other work (we know of) that interprets dynamic programming geometrically is [28].
There is a rich literature on methods for speeding up dynamic programming [29, 30, 6, 11]. They involve a variety of powerful techniques such as monotonicity of transition points, quadrangle inequalities, and Monge matrix searching [2], many of which have found applications in other settings. The focus of these methods is to reduce the (average) number of transitions that a state is involved in, often from to . Therefore, their running times are lower bounded by the number of states in the dynamic programs.
1.1 Results
We formally state our theorem for the decision problem here; results for shape-constrained function fitting are obtained as corollaries. For our algorithm, the discrete derivatives (as defined above) are inconvenient because they involve the -distance between points. We therefore normalize all -distances to (s. t. ); for the second-order constraints, this normalization makes the introduction of an additional parameter necessary, the scaling factors (see below).
Definition \thetheorem (1st/2nd-diff-constrained vectors):
Let -dimensional vectors \mathchoice{\mbox{\boldmath\displaystyle x}}{\mbox{\boldmath\textstyle x}}{\mbox{\boldmath\scriptstyle x}}{\mbox{\boldmath\scriptscriptstyle x}}^{-}\leq\mathchoice{\mbox{\boldmath\displaystyle x}}{\mbox{\boldmath\textstyle x}}{\mbox{\boldmath\scriptstyle x}}{\mbox{\boldmath\scriptscriptstyle x}}^{+} (value bounds), \mathchoice{\mbox{\boldmath\displaystyle y}}{\mbox{\boldmath\textstyle y}}{\mbox{\boldmath\scriptstyle y}}{\mbox{\boldmath\scriptscriptstyle y}}^{-}\leq\mathchoice{\mbox{\boldmath\displaystyle y}}{\mbox{\boldmath\textstyle y}}{\mbox{\boldmath\scriptstyle y}}{\mbox{\boldmath\scriptscriptstyle y}}^{+} (difference bounds), \mathchoice{\mbox{\boldmath\displaystyle z}}{\mbox{\boldmath\textstyle z}}{\mbox{\boldmath\scriptstyle z}}{\mbox{\boldmath\scriptscriptstyle z}}^{-}\leq\mathchoice{\mbox{\boldmath\displaystyle z}}{\mbox{\boldmath\textstyle z}}{\mbox{\boldmath\scriptstyle z}}{\mbox{\boldmath\scriptscriptstyle z}}^{+} (second-order difference bounds), and \mathchoice{\mbox{\boldmath\displaystyle\alpha}}{\mbox{\boldmath\textstyle\alpha}}{\mbox{\boldmath\scriptstyle\alpha}}{\mbox{\boldmath\scriptscriptstyle\alpha}}>0 be given. We define to be the set of all \mathchoice{\mbox{\boldmath\displaystyle b}}{\mbox{\boldmath\textstyle b}}{\mbox{\boldmath\scriptstyle b}}{\mbox{\boldmath\scriptscriptstyle b}}\in\mathbb{R}^{n} that satisfy the following constraints:
[TABLE]
*Moreover, we consider the “truncated problems” , where is the set of all \mathchoice{\mbox{\boldmath\displaystyle b}}{\mbox{\boldmath\textstyle b}}{\mbox{\boldmath\scriptstyle b}}{\mbox{\boldmath\scriptscriptstyle b}}\in\mathbb{R}^{n} that satisfy the constraints up to (instead of ). *
A visualization of an example is shown in Figure 1. We can encode an instance (\mathchoice{\mbox{\boldmath\displaystyle x}}{\mbox{\boldmath\textstyle x}}{\mbox{\boldmath\scriptstyle x}}{\mbox{\boldmath\scriptscriptstyle x}},\mathchoice{\mbox{\boldmath\displaystyle y}}{\mbox{\boldmath\textstyle y}}{\mbox{\boldmath\scriptstyle y}}{\mbox{\boldmath\scriptscriptstyle y}},\mathchoice{\mbox{\boldmath\displaystyle f^{\prime\pm}}}{\mbox{\boldmath\textstyle f^{\prime\pm}}}{\mbox{\boldmath\scriptstyle f^{\prime\pm}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime\pm}}},\mathchoice{\mbox{\boldmath\displaystyle f^{\prime\prime\pm}}}{\mbox{\boldmath\textstyle f^{\prime\prime\pm}}}{\mbox{\boldmath\scriptstyle f^{\prime\prime\pm}}}{\mbox{\boldmath\scriptscriptstyle f^{\prime\prime\pm}}}) of the decision version of the weighted- function-fitting problem with second-order constraints as 1st/2nd-diff-constrained vectors by setting
[TABLE]
So, our goal is to efficiently compute some \mathchoice{\mbox{\boldmath\displaystyle b}}{\mbox{\boldmath\textstyle b}}{\mbox{\boldmath\scriptstyle b}}{\mbox{\boldmath\scriptscriptstyle b}}\in\mathcal{S} or determine that . Our core technical result is a linear-time algorithm for this task:
Theorem 1.1 (1st/2nd-diff-constrained decision)
With the notation of Definition 1.1, in time, we can compute \mathchoice{\mbox{\boldmath\displaystyle b}}{\mbox{\boldmath\textstyle b}}{\mbox{\boldmath\scriptstyle b}}{\mbox{\boldmath\scriptscriptstyle b}}\in\mathcal{S} or determine that .
Section 2 will be devoted to the proof. To simplify the presentation, we will assume throughout that , , , , , are bounded.111Some problems are stated with values, but we can always replace unbounded values in the algorithms with an (input-specific) sufficiently large finite number.
For the optimization version of the problem, Equation (1), we consider approximate solutions in the following sense.
Definition 1.2 (-approximation)
We call \mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}}\in F an -approximate solution to the weighted function-fitting problem if it satisfies
[TABLE]
By a simple binary search on , we can find approximate solutions.
Theorem 1.3 (Main result)
There exists an algorithm that computes an -approximate solution to the weighted- convex regression problem that runs in time, where The same holds true for isotonic regression, Lipschitz isotonic regression, convex isotonic regression.
Proof 1.4
We will argue for the case of convex regression here, other cases are similar. Abbreviate L(\mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}})=\max_{i}w_{i}|f_{i}-y_{i}|. For a given , the decision version of convex regression can be solved in time using Theorem 1.1. That is, in time, we can either find \mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}}\in F such that L(\mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}})\leq L or conclude that for all \mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}}\in F, L(\mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}})>L. If we know an for which there exists \mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}}\in F with L(\mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}})\leq L_{0}, then we can do a binary search for in We can easily find such an for the convex case: Let \mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}}=\min y_{j} be constant (hence convex). For this , we have L(\mathchoice{\mbox{\boldmath\displaystyle f}}{\mbox{\boldmath\textstyle f}}{\mbox{\boldmath\scriptstyle f}}{\mbox{\boldmath\scriptscriptstyle f}})\leq(\max_{j}w_{j})(\max_{j}y_{j}-\min_{j}y_{j}). Therefore, we can take and the result immediately follows.
We note that for the specific case of unweighted convex function fitting, there is a simpler linear-time greedy algorithm; we give more details on that in Appendix A. This algorithm was the initial motivation for studying this problem and for the geometric approach we use. For more general settings, in particular second-order differences that are allowed to be both positive and negative, the greedy approach does not work; our generic algorithm, however, is almost as simple and efficient.
2 First- and second-order difference-constrained vectors
In this section, we present our main algorithm and prove Theorem 1.1. In Section 2.1, we give an overview and introduce the feasibility polygons . Section 2.2 shows how can be inductively computed from via a geometric transformation. We finally show how this transformation can be computed efficiently, culminating in the proof of Theorem 1.1, in Section 2.3. Two proofs are deferred to Appendix B and C.
2.1 Overview of the algorithm
Recall that the problem we want to solve, in order to prove Theorem 1.1, is finding a feasible point in from Definition 1.1. Our algorithm will use dynamic programming (DP) where each state is associated with the feasible in the truncated problem. We will iteratively determine all such that is the th entry of some \mathchoice{\mbox{\boldmath\displaystyle b}}{\mbox{\boldmath\textstyle b}}{\mbox{\boldmath\scriptstyle b}}{\mbox{\boldmath\scriptscriptstyle b}}\in\mathcal{S}_{i}.
Feasible have to respect the first- and second-order difference constraints. To check those, we also need to know the possible pairs of th and th entries for some \mathchoice{\mbox{\boldmath\displaystyle b}}{\mbox{\boldmath\textstyle b}}{\mbox{\boldmath\scriptstyle b}}{\mbox{\boldmath\scriptscriptstyle b}}\in\mathcal{S}_{i-1}, so the states have to maintain more information than the alone. It will be instrumental to rewrite this pair as , the combination of valid values and valid slopes at which we entered for a solution in . From that, we can determine the valid slopes at which we can leave using our shape constraints. We thus define the feasibility polygons
[TABLE]
for . See Figure 1 for an example. We view each point in as a “state” in our DP algorithm, and our goal becomes to efficiently compute from . The key observation is that each is indeed an -vertex convex polygon, and we only need an efficient way to compute the vertices of from those of . This needs a clever representation, though, since all vertices can change when going from to . A closer look reveals that we can represent the vertex transformations implicitly, without actually updating each vertex, and we can combine subsequent transformations into a single one. More specifically, if we consider the boundary of , the transformation to consists of two steps: (1) a linear transformation for the upper and lower hull of , and (2) a truncation of the resulting polygon by vertical and horizontal lines (i. e., an intersection of the polygon and a half-plane).
The first step requires a more involved proof and uses that all line segments of have weakly positive slope (“”, formally defined below). Implicitly computing the first transformation as we move between is straightforward, only requiring a composition of linear operations (a different one, though, for upper and lower hull). We can apply the cumulative transformation whenever we need to access a vertex.
The second step is conceptually simpler, but more difficult to implement efficiently, as we have to determine where a line cuts the polygon in amortized constant time. For this operation, we separately store the vertices of the upper and lower hull of in two arrays, sorted by increasing -coordinate; since is , -values are also increasing. A linear search for intersections has overall cost since we can charge individual searches to deleted vertices.
Finally, if , we compute a feasible vector backwards, starting from any point in . Since we do not explicitly store the , this requires successively “undoing” all operations (going from back to ); see Appendix C for details.
2.2 Transformation from state to
We first define the structural property “” that our method relies on.
Definition 2.1 ()
We say a polygon with vertices is if for all edges of . Here, the slope between two points , is defined as , when , and , otherwise.
We will now show that can be computed by applying a simple geometric transformation to . In passing, we will prove (by induction on ) that all are . For the base case, note that , which is an intersection of half-planes. The slopes of the defining inequalities are all non-negative or infinite, so is .
Let us now assume that , , is ; we will consider the transformation from to and show that it preserves this property. We begin by separating the transformation from to into two main steps.
Step 1: Second-order constraint only.
For the first step, we ignore the value and first-order constraints at index . This will yield a convex polygon, , that contains ; in Step 2, we will add the other constraints at to obtain itself.
Definition 2.2 (: 2nd-order-only polygons)
For a fixed , consider the modified problem with and . Define the second-order-only polygon, , as the polygon of this modified problem (considering only the constraints at ).
The statement of the following lemma is very simple observation, but allows us to compute from with an explicit geometric construction, (whereas such seemed not obvious for the original feasibility polygons).
Lemma 2.3 (: scaled, sheared and shifted )
P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}=\bigl{\{}(x+\alpha_{i}y+z,\alpha_{i}y+z)\mid(x,y)\in P_{i-1},z\in[z_{i}^{-},z_{i}^{+}]\bigr{\}}.
Proof 2.4
The only constraint at is . We rewrite this as (a) a constraint for , using that is the -coordinate in , and (b) a constraint for , using that, additionally, is the -coordinate in .
Once we have computed this polygon , computing is easy: adding the constraints and requires only cutting with two horizontal and vertical lines. We give a visual representation of the mapping on an example in Figure 2. We break the above mapping into two simpler stages:
Corollary 2.5 (: sheared and shifted )
Setting , we have
P_{i}^{\vphantom{\scriptscriptstyle g}\smash{\scriptscriptstyle(}z\smash{\scriptscriptstyle)}}=\bigl{\{}(x+y+z,y+z)\mid(x,y)\in P_{i-1}^{\alpha_{i}},z\in[z_{i}^{-},z_{i}^{+}]\bigr{\}}.
We note that scaling the -coordinate by preserves the -property:
Lemma 2.6
Let . If is , so is .
Proof 2.7
Scaling the -coordinates will preserve all of the vertices of , and also scale the slope of each vertex pair by . So, is .
That leaves us with the core of the transformation, from to . Intuitively, it can be viewed as sliding along the line by any amount and taking the union thereof, (see Figure 2). To compute the result of this operation, we split the boundary into upper and lower hull.
Definition 2.8 (Upper/lower hull)
Let be a convex polygon with vertex set . We define the upper hull (vertices) resp. lower hull (vertices) of as
[TABLE]
Unless specified otherwise, hull vertices are ordered by increasing -coordinate.
Note that a vertex can be in both hulls. Moreover, the leftmost vertices in and always have the same -coordinate, similarly for the rightmost vertices. As proved in Lemma 2.3, each point in is mapped to a line-segment with slope ; we give this mapping a name.
Definition 2.9 (2nd-order transform)
Let be the line-segment and denote by and the two endpoints of .
We write for the element-wise application of to a set of points.
The vertices of result from transforming the upper hull of by and the lower hull by . The next lemma formally establishes that applying resp. to the hulls of correctly computes , (again, compare Figure 2).
Lemma 2.10 (From to via hulls)
If is , then is and and , where (lower-left) and (upper-right) are the first vertex of and the last vertex of , respectively.
We defer the formal proof to Appendix B. Intuitively, since each point in is mapped to a line-segment with slope in , is obtained by sliding along the line . Note here that we could allow and/or , where the functions would instead map to the ray centered at and either pointed upwards or downwards with slope . The full transformation from to can now be stated as:
Lemma 2.11 ( to )
Let be the function for . If is , then is with
[TABLE]
with and the lower-left resp. upper-right vertex of .
Proof 2.12
This follows immediately from Corollary 2.5 and Lemmas 2.6 and 2.10.
Step 2: Truncating by value and slope.
To complete the transformation, we need to add the constraints and to . This is equivalent to cutting our polygon with two vertical and horizontal planes. The following lemma shows that this preserves the -property.
Lemma 2.13 (# new vertices)
If is with vertices, then is either empty or with at most vertices.
It follows that over the course of the algorithm, only vertices are added in total. This will be instrumental for analyzing the running time.
Proof 2.14
We know that is , and it follows easily from the definition that cutting by horizontal and vertical planes will preserve this property. Furthermore, note that cutting a convex polygon will increase the total number of vertices by at most one. We added at most 2 vertices to to obtain . We then cut by the inequalities , , , and , i. e., two horizontal and vertical planes. Each adds at most one vertex, giving the desired upper bound.
2.3 Algorithm
A direct implementation of the transformation of Lemma 2.11 yields a “brute force” algorithm that maintains all vertices of and checks if is empty; (the running time would be quadratic). It works as follows:
Compute the vertices of . 2. 2.
For , do the following:
- 2.1.
At step , scale the -coordinate of each vertex by . 2. 2.2.
Apply resp. to each vertex, depending on which hull it is in. 3. 2.3.
Add the new vertex to u-hull and l-hull, as per Lemma 2.11. 4. 2.4.
Delete all the vertices outside and
add the vertices created by intersecting with . 3. 3.
If , compute by backtracing.
Observe that Lemma 2.11 applies the same linear function (multiplication of -coordinate by and or ) to all vertices in u-hull resp. l-hull. So, we do not need to modify every vertex each time; instead, we can store – separately for u-hull and l-hull – the composition of the linear transformations as a matrix. Whenever we access a vertex, we take the unmodified vertex and apply the cumulative transformation in time.
At each step, after applying the linear transformations, by Lemma 2.11 we also need to copy the leftmost vertex of l-hull, add it to the left of u-hull and copy the rightmost vertex of u-hull and add it to the right of l-hull. To add these vertices, we simply apply the inverse of each respective cumulative transformation such that all stored vertices require the same transformation. This will also take time.
Since all the slopes of are non-negative () and we keep vertices sorted by -coordinate, the truncation by a horizontal or vertical plane can only remove a prefix or suffix from u-hull and l-hull of . Depending on the constraint we are adding, (, , , or ), we start at the rightmost or leftmost vertex of the u-hull and l-hull, and continue until we find the intersection with the cutting plane. We remove all visited vertices.
This could take time in any single iteration, but the total cost over all iterations is since we start with vertices and add vertices throughout the entire procedure (by Lemma 2.13). This allows us to use two deques (double-ended queues), represented as arrays, to store the vertices of u-hull and l-hull. Putting this all together gives the linear time algorithm for the decision problem “?”.
To compute an actual solution when , we compute , in this order. From the last , we can find a feasible (the -coordinate of any point in ). Then, we retrace the steps of our algorithm through specific points in each . Since intermediate were only implicitly represented, we have to recover by “undoing” the algorithm’s operations in reverse order; this is possible in overall time by remembering the operations from the forward phase. The details on the backtracing step are deferred to Appendix C, where we also present the final algorithm.
3 Conclusion
In this article, we presented a linear-time dynamic-programming algorithm to decide whether there is a vector that lies (componentwise) between given upper and lower bounds and additionally satisfies inequalities on its first- and second-order (successive) differences. This method can be used to approximate weighted- shape-restricted function-fitting problems, where the shape restrictions are given as bounds on first- and/or second-order differences (local slope and curvature).
This is a first step towards much sought-after efficient methods for more general convex regression tasks. A main limitation of our approach is the restriction to one-dimensional problems. We show in Appendix D that a natural extension of the problem studied here to directed acyclic graphs is already as hard as linear programming, leaving little hope for an efficient generic solution. This is in sharp contrast to isotonic regression, where similar extensions to arbitrary partial orders do have efficient algorithms (for ) [27]. This might also be bad news for multidimensional regression with second-order constraints, since higher dimensions entail, among other complications, a non-total order over the inputs.
A second limitation is the error metric, which might not be adequate for all applications. We leave the question whether similarly efficient methods are also possible for other metrics for future work. A further extension to study is convex unimodal regression; here, finding the maximum is part of the fitting problem, and so not directly possible with our presented method.
Acknowledgments
We thank Richard Peng, Sushant Sachdeva, and Danny Sleator for insightful discussions, and our anonymous referees for further relevant references and insightful comments that significantly improved the presentation.
Appendix
Appendix A Simple greedy algorithm for convex regression
In this appendix, we give details on a simpler algorithm for the special case of unweighted convex function fitting.
Theorem A.1
There exists an algorithm for the unweighted convex regression that runs in time.
Proof A.2
We consider the following problem. Given an -dimensional vector , and parameter , find a convex vector such that , if such a vector exists.
This clearly fits under our parameters of Definition 1.1 by setting \mathchoice{\mbox{\boldmath\displaystyle x^{-}}}{\mbox{\boldmath\textstyle x^{-}}}{\mbox{\boldmath\scriptstyle x^{-}}}{\mbox{\boldmath\scriptscriptstyle x^{-}}}=\mathchoice{\mbox{\boldmath\displaystyle a}}{\mbox{\boldmath\textstyle a}}{\mbox{\boldmath\scriptstyle a}}{\mbox{\boldmath\scriptscriptstyle a}}-\Delta,\mathchoice{\mbox{\boldmath\displaystyle x^{+}}}{\mbox{\boldmath\textstyle x^{+}}}{\mbox{\boldmath\scriptstyle x^{+}}}{\mbox{\boldmath\scriptscriptstyle x^{+}}}=\mathchoice{\mbox{\boldmath\displaystyle a}}{\mbox{\boldmath\textstyle a}}{\mbox{\boldmath\scriptstyle a}}{\mbox{\boldmath\scriptscriptstyle a}}+\Delta, both and to be unbounded, and \mathchoice{\mbox{\boldmath\displaystyle z^{-}}}{\mbox{\boldmath\textstyle z^{-}}}{\mbox{\boldmath\scriptstyle z^{-}}}{\mbox{\boldmath\scriptscriptstyle z^{-}}}=0,\mathchoice{\mbox{\boldmath\displaystyle z^{+}}}{\mbox{\boldmath\textstyle z^{+}}}{\mbox{\boldmath\scriptstyle z^{+}}}{\mbox{\boldmath\scriptscriptstyle z^{+}}}=\infty, along with \mathchoice{\mbox{\boldmath\displaystyle\alpha}}{\mbox{\boldmath\textstyle\alpha}}{\mbox{\boldmath\scriptstyle\alpha}}{\mbox{\boldmath\scriptscriptstyle\alpha}}=1. A binary search on gives a algorithm.
However, this can also be solved by considering the set of points for all , and taking the lower hull,222The lower hull of a set of points is the subset of vertices of the convex hull, where is the minimal -coordinate of all points with the -coordinate in the convex hull; see also Definition 2.8.
, such that for each point in this lower hull we set . We claim that the minimum possible such that is exactly the answer to this problem. If is a vertex of the convex hull, is always at least . Otherwise, let , be two vertices of such that . We have
[TABLE]
[TABLE]
[TABLE]
If violates this for some , then it is impossible to fit a convex function through the intervals , , and .
Conversely, if satisfies all of such constraints, for all , then cannot be greater than as that would violate being the convex lower hull of . Thus, is a possible solution.
It takes time to compute the lower convex hull and time to calculate the minimum . Thus, this algorithm solves convex regression in time.
The above method can also be adapted for inputs with -values that are non-uniformly spaced. However, it does not directly generalize to weighted regression: moving points up by can lead to different lower hulls for different values of .
Appendix B Proof of Lemma 2.10
The proof of Lemma 2.10 will be separated into two stages. First, we show that the polygon defined by has upper-hull and lower-hull , where is the first vertex of and is the last vertex of . Furthermore, this polygon will have slopes between vertices in . This property will then allow us to show that is equivalent to the convex hull of the vertices, which implies the claim.
In order to show that the has all slopes between [math] and , we consider how and affect slopes.
Lemma B.1 (Bounded slopes)
If is , then for any connected vertices , any , and , we have
[TABLE]
and for any connected vertices , if , then
[TABLE]
Proof B.2
We first write the slope function explicitly to obtain
[TABLE]
This implies that if then , and if then . Furthermore, this gives the identity
[TABLE]
when Combined with the fact that all slopes are non-negative, this gives both of our desired inequalities.
The first inequality of the lemma above will allow us to show that all of the slopes between vertices are bounded, and the second implies that each of the vertices remains a vertex, giving the following corollary.
Corollary B.3 (Hulls by elementwise transformation)
If is , then the convex hull of has and , where is the first (lower-left) vertex of and is the last (upper-right) vertex of . Furthermore, for any connected vertices in , we have .
Proof B.4
By construction, the first and last vertices of and are the same. Let be the first vertex of , which gives two possibilities, either (1): , or (2) . For case (1) it is easy to see that , and for case (2), we showed in the proof of Lemma B.1 that implies , which combined with gives . Furthermore the slopes between all vertices in are less than by Definition 2.8, and therefore less than under the transformation by Lemma B.1. Along with the second inequality of Lemma B.1, this implies that makes up a concave function from to .
By symmetric reasoning we see that makes up a convex function from to . Additionally, the second inequality states that every element in and must be a vertex. Accordingly, must be a convex polygon with all slopes between [math] and .
We now have fixed upper and lower hulls of a polygon, and we use the representation as the convex hull its vertices, along with the bounded-slope property, to show that this polygon is in fact equal to . In particular, all the slopes being bounded by will be critical here because each point maps to a line segment from to , which has slope . If we then consider to be in the upper hull, if the slopes of our new upper-hull for were greater than , the point would lie outside of this hull. Our bounded slopes prevent this, though, and lead to the following lemma.
Lemma B.5
Let be and let be the convex hull of
[TABLE]
Then .
Proof B.6
We show both inclusions.
- •
.
By definition of , any point , can be written as a convex combination
[TABLE]
where the sum is over the vertices of , , and . We set ; clearly, . Furthermore set and . We know each is a vertex in , so by convexity must be in , implying by Corollary 2.5.
- •
.
Assume towards a contradiction there were with and , but . By definition and assumption, both and are convex, so there must be a vertex of such that . Furthermore, by convexity of , there must also exist such that . Assume w. l. o. g. that . By definition of , we have , so we must have .
Since is and is monotone, is dominated333 is said to dominate if and .
by , and similarly, dominates . Furthermore, by Corollary B.3 the upper hull lies above the line segment from from to and has slope at most 1. But the slope between and is exactly , so cannot lie above the upper hull.
Finally, also cannot lie below because otherwise there would exist that lies above , contradicting being in . Because the upper hull and lower hull combine to the convex polygon and because the -coordinate of is within the range of -coordinate of , we have , a contradiction.
With this, we finish the proof of our lemma.
Proof B.7 (of Lemma 2.10)
Follows directly from Corollary B.3 and Lemma B.5.
Appendix C Complete algorithm
In this appendix, we give detailed pseudocode for our entire algorithm. We also discuss the details on the backtracing step, i. e., computing an actual solution \mathchoice{\mbox{\boldmath\displaystyle b}}{\mbox{\boldmath\textstyle b}}{\mbox{\boldmath\scriptstyle b}}{\mbox{\boldmath\scriptscriptstyle b}}\in\mathcal{S} from the (implicitly represented) feasibility polygons . The final procedure is shown in Algorithm 1.
C.1 Implicitly computing the
The main ideas have been described in Section 2.3. We represent points in homogeneous coordinates, i. e., becomes the column vector . That allows our transformation to be represented as a single matrix, and we can compose them by multiplying the matrices. We store the current matrix in Algorithm 1 in (for the upper hull) and for the lower hull. and denote the deques storing the (untransformed) points of u-hull and l-hull in homogeneous coordinates and in sorted order.
To compute from (Step 2), we update the transformation matrices and add the new points to the hull (following Lemma 2.11). After that (line 9), and represent . To implement the intersection with the half planes corresponding to the value and first-order constraints at , we separately cut upper and lower hull with all four boundaries. Since we store upper and lower hull separately, vertical line segments are not explicitly represented in either hull, which requires some care in cutting with horizontal lines. We therefore use the following strategy –it is illustrated on an example in Figure 3: We first cut with the left and right boundaries (the value constraints), then transform our representation temporarily to left and right hulls (lines 17–18), which can easily handle cutting by horizontal line segments. Cutting is always implemented as a linear scan of resp. , during which all vertices outside the constraint halfplane are removed. Then we add a new vertex at the intersection of the last segment with the constraint. (We remember the last removed vertex for doing so.)
C.2 Backtracing
Suppose we have computed as described above, and then partially backtraced through a sequence of feasible points. We are now at in . Since , for some (unknown) , we can recover from by subtracting the two coordinates of . To recover , suppose we can find efficiently. Since is an interval, the following lemma allow us to find .
Lemma C.1 (back 1 step)
Let . Either or for some .
Intuitively, a vertical line segment inside is mapped to a line-segment with slope in , because the line segments the points in are mapped to lie all on the same line (overlapping with each other).
Proof C.2
If , by the maximality of , . Since there exists such that , for some . Consider . Then . Since , . The lemma is proven by letting be .
In the former case of Lemma C.1, we can take as . In the latter case, we can take as .
Since is the -coordinate of the intersection of and the vertical line , to compute , we want to find two vertices in , and , such that . is just the intersection of the line segment between and and the vertical line . The following lemma shows how to find and efficiently using an amortized constant-time algorithm.
Lemma C.3 (Computing )
Suppose and are two vertices in , and some point satisfies . Let be some point in with . Then , where means taking the -coordinate of a point and takes the -coordinate.
Proof C.4
Assume towards a contradiction that . Since , we have . But . Contradiction.
The amortized constant-time algorithm to retrieve depends on the implementation of the deques. Since we will add vertices to the deques during the whole algorithm, the (textbook) fixed-size array-based implementation suffices; we recall it to fix notation. A deque is represented by array and two indices , . is the index of the first element of and is the index of the last element. If we want to add an element to the left of the deque, the two operations , suffice. Similarly, we can add/pop elements from left/right. During our algorithm, (resp. ) can move to the left (resp. right) by at most positions, so can be an array of length . If we store the vertices of in the middle of initially, we never exceed the boundaries of when running the algorithm.
Definition C.5 (Position)
We define as the smallest index (in the array representing deque ) of a vertex of with -coordinate at least .
Note that adding or removing elements does not change the vertex at a given index (unless that vertex itself is removed).
Lemma C.6 (Monotonicity of positions)
for some .
Proof C.7
By Lemma C.3, . So is stored after . And since our algorithm stores at the same place as , .
Lemma C.6 allows us to find by moving a pointer monotonically to the right. Thus, we can retrieve in order by unrolling our linear algorithm for the decision problem and moving the pointer . This process takes time overall.
C.3 Analysis
We conclude with the proof of our main theorem.
Proof C.8 (of Theorem 1.1)
The correctness of Algorithm 1 follows from the preceding discussions: By Lemma 2.11, the iterative transformations compute the as defined in (2), and iff . Moreover, Lemma C.1 shows that, when , Step 3 computes a valid \mathchoice{\mbox{\boldmath\displaystyle b}}{\mbox{\boldmath\textstyle b}}{\mbox{\boldmath\scriptstyle b}}{\mbox{\boldmath\scriptscriptstyle b}}\in\mathcal{S}. It remains to analyze the running time.
- •
Step 1 takes time since the vertices of are a subset of the (at most) 12 intersection points of the defining lines. ( is the trapezoid spanned by , intersected with the halfspaces and .)
- •
Step 2. The operations inside the loops are all constant-time and the outer loop runs times. Moreover, the inner while-loops all remove a node from a deque, so their total cost over all iterations of the for-loop is , too: We start with vertices and adding at most vertices throughout the entire procedure (Lemma 2.13), so we cannot remove more than vertices.
- •
Step 3. All operations except for the first line inside the for-loop take constant time. The inner while-loop runs for overall iterations, since only moves right and we add vertices in total.
It remains to implement the first line of the loop body in overall time. To be able to undo the changes to , , , , we keep a log for each instruction executed in Step 2, so that we can undo their changes here (in the opposite order). Since Step 2 runs in total time, the rollback also runs in time.
Since all three steps run in linear time, so does the whole algorithm.
Appendix D Generalization to DAGs is hard
In this appendix, we will give a natural generalization of Definition 1.1 to arbitrary DAGs and investigate its complexity. Our original setting with differences of adjacent indices only corresponds to a directed-path graph.
In light of rather general results for isotonic regression, the path setting might appear quite restrictive; we will argue here why these conditions probably cannot be relaxed much further if we want an time algorithm.
Definition D.1
Suppose we are given a directed acyclic graph with edges and number of length two directed paths in , -dimensional vectors , dimensional vector , and dimensional vectors and . We define to be the set of all -dimensional vectors such that for all , for all edges , and for all pairs of edges
In contrast to Theorem 1.1, we show that determining if if empty or not is as hard as solving linear programs.
Theorem D.2
With notation as in Definition D.1, if we can determine is empty or not in time , then we can determine feasibility of any set of linear constraints defined by bounded integer coefficients in time, where and are two constants and the absolute value of each coefficient in the linear constraints is no more than .
Our reduction to prove Theorem D.2 is closely motivated by the hardness of isotropic total variation from [20], as well as subsequent works on extending such hardness results to positive linear programs. Compared to these results though, it sidesteps linear systems, and is a more direct invocation of the completeness of 2-commodity flow linear programs from [15].
We first consider a more restricted class of problems than Definition D.1 allows (where all the ’s in Definition D.1 are set to be ). Formally we define the problem as:
Definition D.3
A generalized second-order constrained feasibility problem is defined by variables , combined with a set of constraints parameterized by
Upper and lower bounds on the variables and 2. 2.
Upper and lower bounds on the first order differences and and corresponding indices 3. 3.
Upper and lower bounds on the second order differences and and corresponding indices
and constraints
Value Constraints:
First Order Constraints:
Second Order Constraints:
The goal is to decide whether there exists that satisfy all these constraints simultaneously.
Proof D.4 (of Theorem D.2)
It is easy to see that the problem defined in Definition D.3 is a special case of the problem in Definition D.1. This is obtained by forming a DAG with edges , , for all , , , , . We will prove that a general linear programming feasibility problem with polynomially-bounded integer coefficients can be expressed as a second-order-constrained feasibility problem (Definition D.3). In particular, we will show that a feasibility of a set of linear constraints containing at most non-zero coefficients whose absolute values are integers no more than can be reduced to value, first order and second order constraints as in Definition D.3.
Note that the second constraint in Definition D.3 is the same as
[TABLE]
In particular, it allows us to create constraints of the form
[TABLE]
We will now show how we can restate a feasibility of a set of general linear constraints can be expressed as a second order constrained feasibility problem as in Definition D.1. The main idea will be clear when we consider a linear constraint of the form
[TABLE]
with a power of , and in increasing order. To express this in terms of second order constraints, we can introduce new variables
[TABLE]
and use to represent the sum of and and so on. Repeating this halves the value of , but aggregates the whole sum into a single variable. Therefore, we can express the above linear constraint as one value constraint
[TABLE]
and second order constraints
[TABLE]
In case is not a power of , we can add dummy variables whose values we restrict to zero using the value constraints. This process uses at most value constraints. So we have shown that we can express any linear constraint of the form in terms of second order constraints and value constraints.
Now consider the case with both positive and negative values in the linear constraint
[TABLE]
We can aggregate the sums of the variables with positive coefficients and negative coefficients separately, and let us denote the resulting variables by We can now bound the difference using a first order constraint of the form
[TABLE]
This results in additional first order constraints for each linear constraint.
Finally, when the coefficients are arbitrary integers, we can do pairing based on the binary representation. The second order constraint and value constraint allows us to create constrains of the form
[TABLE]
[TABLE]
which are equivalent to
[TABLE]
So we can introduce new variables representing for any where is a constant. Thus, given any linear constraint in variables with integer coefficients that are bounded by , we first represent each coefficient by its binary representation, increasing the number of non-zero coefficients by times and creating second order constraints and value constraints. Then all the coefficients in the linear constraints are or and we can use the reduction above. In summary, we can solve any linear programming feasibility problem with non-zero coefficients which are integers bounded by by a generalized second-order constrained feasibility problem of constraints. This together with our assumption of an algorithm solving generalized second-order constrained feasibility problem in time prove the theorem.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Pankaj K. Agarwal, Jeff M. Phillips, and Bardia Sadri. Lipschitz unimodal and isotonic regression on paths and trees. In LATIN 2010: Theoretical Informatics , pages 384–396. Springer Berlin Heidelberg, 2010. doi:10.1007/978-3-642-12200-2\_34 . · doi ↗
- 2[2] Alok Aggarwal, Maria M. Klawe, Shlomo Moran, Peter Shor, and Robert Wilber. Geometric applications of a matrix-searching algorithm. Algorithmica , 2(1-4):195–208, November 1987. doi:10.1007/bf 01840359 . · doi ↗
- 3[3] Francis Bach. Efficient algorithms for non-convex isotonic regression through submodular optimization. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31 , pages 1–10. Curran Associates, Inc., 2018.
- 4[4] Gábor Balázs. Convex Regression: Theory, Practice, and Applications . Ph D thesis, 2016. doi:10.7939/R 3T 43J 98B . · doi ↗
- 5[5] Bernard Chazelle. A theorem on polygon cutting with applications. In Symposium on Foundations of Computer Science (SFCS) , pages 339–349. IEEE, 1982. doi:10.1109/SFCS.1982.58 . · doi ↗
- 6[6] D. Eppstein, Z. Galil, and R. Giancarlo. Speeding up dynamic programming. In Symposium on Foundations of Computer Science (SFCS) . IEEE, 1988. doi:10.1109/sfcs.1988.21965 . · doi ↗
- 7[7] Jeff Erickson. Shortest homotopic paths, 2009. Lecture notes for computational topology. URL: http://jeffe.cs.illinois.edu/teaching/comptop/2009/notes/shortest-homotopic-paths.pdf .
- 8[8] C. Fefferman. Smooth interpolation of data by efficient algorithms. In Excursions in Harmonic Analysis, Volume 1 , pages 71–84. Birkhäuser Boston, November 2012. doi:10.1007/978-0-8176-8376-4\_4 . · doi ↗
