Calculus, constrained minimization and Lagrange multipliers: Is the   optimal critical point a local minimizer?

Ademir Alves Ribeiro; Jose Renato Ramos Barbosa

arXiv:1904.05222·math.HO·April 11, 2019

Calculus, constrained minimization and Lagrange multipliers: Is the optimal critical point a local minimizer?

Ademir Alves Ribeiro, Jose Renato Ramos Barbosa

PDF

Open Access

TL;DR

This paper examines the conditions under which critical points in constrained optimization are local minimizers, especially in low-dimensional cases, and discusses common misconceptions and counterexamples in undergraduate calculus.

Contribution

It clarifies the sufficient conditions for local minimality and presents counterexamples to prevalent undergraduate misconceptions about Lagrange multipliers.

Findings

01

Counterexamples to undergraduate statements about critical points and minimizers

02

Conditions under which critical points are local minimizers in low dimensions

03

Discussion of misconceptions in undergraduate calculus related to constrained optimization

Abstract

In this short note, we discuss how the optimality conditions for the problem of minimizing a multivariate function subject to equality constraints have been dealt with in undergraduate Calculus. We are particularly interested in the 2 or 3-dimensional cases, which are the most common cases in Calculus courses. Besides giving sufficient conditions to a critical point to be a local minimizer, we also present and discuss counterexamples to some statements encountered in the undergraduate literature on Lagrange Multipliers, such as `among the critical points, the ones which have the smallest image (under the function) are minimizers' or `a single critical point (which is a local minimizer) is a global minimizer'.

Equations65

\begin{array}[]{cl}\displaystyle{\rm minimize}&f(x)\\ {\rm subject\ to}&g(x)=0,\end{array}

\begin{array}[]{cl}\displaystyle{\rm minimize}&f(x)\\ {\rm subject\ to}&g(x)=0,\end{array}

Ω = {x \in D ∣ g (x) = 0}

Ω = {x \in D ∣ g (x) = 0}

\begin{array}[]{cl}\displaystyle{\rm minimize}&f(x)\end{array}\quad\mbox{is equivalent to}\quad\begin{array}[]{cl}\displaystyle{\rm minimize}&\varphi(x,y):=f(x)\\ {\rm subject\ to}&g(x,y):=y=0,\end{array}

\begin{array}[]{cl}\displaystyle{\rm minimize}&f(x)\end{array}\quad\mbox{is equivalent to}\quad\begin{array}[]{cl}\displaystyle{\rm minimize}&\varphi(x,y):=f(x)\\ {\rm subject\ to}&g(x,y):=y=0,\end{array}

\nabla f (x^{*}) + i = 1 \sum m λ_{i}^{*} \nabla g_{i} (x^{*}) = 0,

\nabla f (x^{*}) + i = 1 \sum m λ_{i}^{*} \nabla g_{i} (x^{*}) = 0,

g (x^{*}) = 0

f (x) = x_{1} x_{2} + 2 x_{1} x_{3} + \frac{2}{x _{1}} > \frac{1}{x _{1}} > 5 \mbox or f (x) = x_{1} x_{2} + \frac{2}{x _{2}} + 2 x_{2} x_{3} > \frac{1}{x _{2}} > 5,

f (x) = x_{1} x_{2} + 2 x_{1} x_{3} + \frac{2}{x _{1}} > \frac{1}{x _{1}} > 5 \mbox or f (x) = x_{1} x_{2} + \frac{2}{x _{2}} + 2 x_{2} x_{3} > \frac{1}{x _{2}} > 5,

f (x) = x_{1} x_{2} + \frac{2 ( x _{1} + x _{2} )}{x _{1} x _{2}} > x_{1} x_{2} + \frac{10}{x _{1} x _{2}} = \frac{1}{x _{1} x _{2}} ((x_{1} x_{2} - \frac{5}{2})^{2} + \frac{15}{4}) + 5 > 5.

f (x) = x_{1} x_{2} + \frac{2 ( x _{1} + x _{2} )}{x _{1} x _{2}} > x_{1} x_{2} + \frac{10}{x _{1} x _{2}} = \frac{1}{x _{1} x _{2}} ((x_{1} x_{2} - \frac{5}{2})^{2} + \frac{15}{4}) + 5 > 5.

f (x) = \frac{x _{2} + 2 x _{3}}{x _{2} x _{3}} + 2 x_{2} x_{3} > \frac{10}{x _{2} x _{3}} + 2 x_{2} x_{3} = \frac{2}{x _{2} x _{3}} ((x_{1} x_{2} - \frac{5}{4})^{2} + \frac{55}{16}) + 5 > 5,

f (x) = \frac{x _{2} + 2 x _{3}}{x _{2} x _{3}} + 2 x_{2} x_{3} > \frac{10}{x _{2} x _{3}} + 2 x_{2} x_{3} = \frac{2}{x _{2} x _{3}} ((x_{1} x_{2} - \frac{5}{4})^{2} + \frac{55}{16}) + 5 > 5,

\left(\begin{array}[]{c}x_{2}+2x_{3}\\ x_{1}+2x_{3}\\ 2x_{1}+2x_{2}\end{array}\right)+\lambda\left(\begin{array}[]{c}x_{2}x_{3}\\ x_{1}x_{3}\\ x_{1}x_{2}\end{array}\right)=\left(\begin{array}[]{c}0\\ 0\\ 0\end{array}\right)\quad\mbox{and}\quad x_{1}x_{2}x_{3}=1.

\left(\begin{array}[]{c}x_{2}+2x_{3}\\ x_{1}+2x_{3}\\ 2x_{1}+2x_{2}\end{array}\right)+\lambda\left(\begin{array}[]{c}x_{2}x_{3}\\ x_{1}x_{3}\\ x_{1}x_{2}\end{array}\right)=\left(\begin{array}[]{c}0\\ 0\\ 0\end{array}\right)\quad\mbox{and}\quad x_{1}x_{2}x_{3}=1.

\frac{\partial ^{2} f}{\partial x _{1}^{2}} (x^{*}) > 0 \mbox an d \frac{\partial ^{2} f}{\partial x _{1}^{2}} (x^{*}) \frac{\partial ^{2} f}{\partial x _{2}^{2}} (x^{*}) - (\frac{\partial ^{2} f}{\partial x _{1} \partial x _{2}} (x^{*}))^{2} > 0.

\frac{\partial ^{2} f}{\partial x _{1}^{2}} (x^{*}) > 0 \mbox an d \frac{\partial ^{2} f}{\partial x _{1}^{2}} (x^{*}) \frac{\partial ^{2} f}{\partial x _{2}^{2}} (x^{*}) - (\frac{\partial ^{2} f}{\partial x _{1} \partial x _{2}} (x^{*}))^{2} > 0.

f (x) = \frac{1}{7} x_{1}^{7} - \frac{17}{12} x_{1}^{6} + \frac{51}{10} x_{1}^{5} - \frac{63}{8} x_{1}^{4} + \frac{9}{2} x_{1}^{3}

f (x) = \frac{1}{7} x_{1}^{7} - \frac{17}{12} x_{1}^{6} + \frac{51}{10} x_{1}^{5} - \frac{63}{8} x_{1}^{4} + \frac{9}{2} x_{1}^{3}

\dfrac{1}{2}\left(\begin{array}[]{c}x_{1}^{2}(x_{1}-3)^{2}(x_{1}-1)(2x_{1}-3)\\ 0\end{array}\right)+\lambda\left(\begin{array}[]{c}0\\ 1\end{array}\right)=\left(\begin{array}[]{c}0\\ 0\end{array}\right),

\dfrac{1}{2}\left(\begin{array}[]{c}x_{1}^{2}(x_{1}-3)^{2}(x_{1}-1)(2x_{1}-3)\\ 0\end{array}\right)+\lambda\left(\begin{array}[]{c}0\\ 1\end{array}\right)=\left(\begin{array}[]{c}0\\ 0\end{array}\right),

\bar{x}=\left(\begin{array}[]{r}0\\ 0\end{array}\right),\;\;\hat{x}=\left(\begin{array}[]{r}1\\ 0\end{array}\right),\;\;x^{*}=\dfrac{3}{2}\left(\begin{array}[]{r}1\\ 0\end{array}\right)\quad\mbox{and}\quad\tilde{x}=\left(\begin{array}[]{r}3\\ 0\end{array}\right).

\bar{x}=\left(\begin{array}[]{r}0\\ 0\end{array}\right),\;\;\hat{x}=\left(\begin{array}[]{r}1\\ 0\end{array}\right),\;\;x^{*}=\dfrac{3}{2}\left(\begin{array}[]{r}1\\ 0\end{array}\right)\quad\mbox{and}\quad\tilde{x}=\left(\begin{array}[]{r}3\\ 0\end{array}\right).

φ (t) = \frac{1}{7} t^{7} - \frac{17}{12} t^{6} + \frac{51}{10} t^{5} - \frac{63}{8} t^{4} + \frac{9}{2} t^{3} .

φ (t) = \frac{1}{7} t^{7} - \frac{17}{12} t^{6} + \frac{51}{10} t^{5} - \frac{63}{8} t^{4} + \frac{9}{2} t^{3} .

f (\overset{x}{ˉ}) = 0, f (\overset{x}{^}) \approx 0.4511, f (x^{*}) \approx 0.3525 \mbox an d f (\tilde{x}) \approx 2.6035.

f (\overset{x}{ˉ}) = 0, f (\overset{x}{^}) \approx 0.4511, f (x^{*}) \approx 0.3525 \mbox an d f (\tilde{x}) \approx 2.6035.

(x, λ) \in R^{n} \times R^{m} \mapsto ℓ (x, λ) = f (x) + λ^{T} g (x) .

(x, λ) \in R^{n} \times R^{m} \mapsto ℓ (x, λ) = f (x) + λ^{T} g (x) .

\nabla_{xx}^{2} ℓ (x, λ) = \nabla^{2} f (x) + i = 1 \sum m λ_{i} \nabla^{2} g_{i} (x) .

\nabla_{xx}^{2} ℓ (x, λ) = \nabla^{2} f (x) + i = 1 \sum m λ_{i} \nabla^{2} g_{i} (x) .

d^{T} \nabla_{xx}^{2} ℓ (x^{*}, λ^{*}) d > 0

d^{T} \nabla_{xx}^{2} ℓ (x^{*}, λ^{*}) d > 0

f (x) - f (x^{*}) \geq δ ∥ x - x^{*} ∥^{2}

f (x) - f (x^{*}) \geq δ ∥ x - x^{*} ∥^{2}

\nabla^{2}\varphi=\left(\begin{array}[]{cc}\dfrac{\partial^{2}\varphi}{\partial x_{1}^{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{1}\partial x_{2}}\vspace{5pt}\\ \dfrac{\partial^{2}\varphi}{\partial x_{2}\partial x_{1}}&\dfrac{\partial^{2}\varphi}{\partial x_{2}^{2}}\end{array}\right)\quad\mbox{or}\quad\nabla^{2}\varphi=\left(\begin{array}[]{ccc}\dfrac{\partial^{2}\varphi}{\partial x_{1}^{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{1}\partial x_{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{1}\partial x_{3}}\vspace{5pt}\\ \dfrac{\partial^{2}\varphi}{\partial x_{2}\partial x_{1}}&\dfrac{\partial^{2}\varphi}{\partial x_{2}^{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{2}x_{3}}\vspace{5pt}\\ \dfrac{\partial^{2}\varphi}{\partial x_{3}\partial x_{1}}&\dfrac{\partial^{2}\varphi}{\partial x_{3}x_{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{3}^{2}}\end{array}\right)

\nabla^{2}\varphi=\left(\begin{array}[]{cc}\dfrac{\partial^{2}\varphi}{\partial x_{1}^{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{1}\partial x_{2}}\vspace{5pt}\\ \dfrac{\partial^{2}\varphi}{\partial x_{2}\partial x_{1}}&\dfrac{\partial^{2}\varphi}{\partial x_{2}^{2}}\end{array}\right)\quad\mbox{or}\quad\nabla^{2}\varphi=\left(\begin{array}[]{ccc}\dfrac{\partial^{2}\varphi}{\partial x_{1}^{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{1}\partial x_{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{1}\partial x_{3}}\vspace{5pt}\\ \dfrac{\partial^{2}\varphi}{\partial x_{2}\partial x_{1}}&\dfrac{\partial^{2}\varphi}{\partial x_{2}^{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{2}x_{3}}\vspace{5pt}\\ \dfrac{\partial^{2}\varphi}{\partial x_{3}\partial x_{1}}&\dfrac{\partial^{2}\varphi}{\partial x_{3}x_{2}}&\dfrac{\partial^{2}\varphi}{\partial x_{3}^{2}}\end{array}\right)

v^{T} H v > 0,

v^{T} H v > 0,

\left(\begin{array}[]{c}-3x_{1}^{2}+1\\ 1\end{array}\right)+\lambda\left(\begin{array}[]{c}-2x_{1}\\ 1\end{array}\right)=\left(\begin{array}[]{c}0\\ 0\end{array}\right),

\left(\begin{array}[]{c}-3x_{1}^{2}+1\\ 1\end{array}\right)+\lambda\left(\begin{array}[]{c}-2x_{1}\\ 1\end{array}\right)=\left(\begin{array}[]{c}0\\ 0\end{array}\right),

x^{*}=\dfrac{1}{9}\left(\begin{array}[]{r}-3\\ 1\end{array}\right)\quad\mbox{and}\quad\bar{x}=\left(\begin{array}[]{r}1\\ 1\end{array}\right).

x^{*}=\dfrac{1}{9}\left(\begin{array}[]{r}-3\\ 1\end{array}\right)\quad\mbox{and}\quad\bar{x}=\left(\begin{array}[]{r}1\\ 1\end{array}\right).

H=\left(\begin{array}[]{cc}2-6x_{1}&0\\ 0&0\end{array}\right)\quad\mbox{and}\quad v=\left(\begin{array}[]{c}1\\ 2x_{1}\end{array}\right)\perp\nabla g(x).

H=\left(\begin{array}[]{cc}2-6x_{1}&0\\ 0&0\end{array}\right)\quad\mbox{and}\quad v=\left(\begin{array}[]{c}1\\ 2x_{1}\end{array}\right)\perp\nabla g(x).

a_{11} > 0 \mbox an d det (A) > 0,

a_{11} > 0 \mbox an d det (A) > 0,

x^{*}=\dfrac{1}{2}\left(\begin{array}[]{c}2\sqrt[3]{2}\\ 2\sqrt[3]{2}\\ \sqrt[3]{2}\end{array}\right)

x^{*}=\dfrac{1}{2}\left(\begin{array}[]{c}2\sqrt[3]{2}\\ 2\sqrt[3]{2}\\ \sqrt[3]{2}\end{array}\right)

H=-\left(\begin{array}[]{ccc}0&1&2\\ 1&0&2\\ 2&2&0\end{array}\right)\quad\mbox{and}\quad\nabla g(x^{*})=\dfrac{\sqrt[3]{4}}{2}\left(\begin{array}[]{c}1\\ 1\\ 2\end{array}\right).

H=-\left(\begin{array}[]{ccc}0&1&2\\ 1&0&2\\ 2&2&0\end{array}\right)\quad\mbox{and}\quad\nabla g(x^{*})=\dfrac{\sqrt[3]{4}}{2}\left(\begin{array}[]{c}1\\ 1\\ 2\end{array}\right).

A=-\left(\begin{array}[]{crr}1&-1&0\\ 2&0&-1\end{array}\right)\left(\begin{array}[]{ccc}0&1&2\\ 1&0&2\\ 2&2&0\end{array}\right)\left(\begin{array}[]{rr}1&2\\ -1&0\\ 0&-1\end{array}\right)=\left(\begin{array}[]{cc}2&2\\ 2&8\end{array}\right)

A=-\left(\begin{array}[]{crr}1&-1&0\\ 2&0&-1\end{array}\right)\left(\begin{array}[]{ccc}0&1&2\\ 1&0&2\\ 2&2&0\end{array}\right)\left(\begin{array}[]{rr}1&2\\ -1&0\\ 0&-1\end{array}\right)=\left(\begin{array}[]{cc}2&2\\ 2&8\end{array}\right)

v^{T} H v > 0,

v^{T} H v > 0,

\left(\begin{array}[]{c}0\\ 0\\ 1\end{array}\right)+\lambda\left(\begin{array}[]{r}2x_{1}\\ 2x_{2}\\ -2x_{3}\end{array}\right)+\mu\left(\begin{array}[]{c}1\\ 0\\ 1\end{array}\right)=\left(\begin{array}[]{c}0\\ 0\\ 0\end{array}\right),

\left(\begin{array}[]{c}0\\ 0\\ 1\end{array}\right)+\lambda\left(\begin{array}[]{r}2x_{1}\\ 2x_{2}\\ -2x_{3}\end{array}\right)+\mu\left(\begin{array}[]{c}1\\ 0\\ 1\end{array}\right)=\left(\begin{array}[]{c}0\\ 0\\ 0\end{array}\right),

H=\dfrac{1}{4}\left(\begin{array}[]{ccr}2&0&0\\ 0&2&0\\ 0&0&-2\end{array}\right)\ ,\ g_{1}(x^{*})=\left(\begin{array}[]{r}2\\ 0\\ -2\end{array}\right)\quad\mbox{and}\quad\nabla g_{2}(x^{*})=\left(\begin{array}[]{c}1\\ 0\\ 1\end{array}\right).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Numerical Methods and Algorithms · Iterative Methods for Nonlinear Equations

Full text

Calculus, constrained minimization and Lagrange multipliers:

Is the optimal critical point a local minimizer? 000This short note is for those who study or teach calculus.

Ademir Alves Ribeiro444Department of Mathematics, Federal University of Paraná, Brazil (email: [email protected], [email protected]). The first author is supported by CNPq, Brazil, Grant 309437/2016-4.

José Renato Ramos Barbosa444Department of Mathematics, Federal University of Paraná, Brazil (email: [email protected], [email protected]). The first author is supported by CNPq, Brazil, Grant 309437/2016-4.

Abstract

In this short note, we discuss how the optimality conditions for the problem of minimizing a multivariate function subject to equality constraints have been dealt with in undergraduate Calculus. We are particularly interested in the $2$ or $3$ -dimensional cases, which are the most common cases in Calculus courses. Besides giving sufficient conditions to a critical point to be a local minimizer, we also present and discuss counterexamples to some statements encountered in the undergraduate literature on Lagrange Multipliers, such as “among the critical points, the ones which have the smallest image (under the function) are minimizers” or “a single critical point (which is a local minimizer) is a global minimizer”.

**Keywords. ** Calculus, constrained minimization, Lagrange multipliers, critical point, local minimizer, global minimizer.

1 Introduction

In spite of being a strategy for finding the local maxima and/or minima of a function subject to constraints, the Lagrange Multiplier Method (LMM), particularly when it is used for solving undergraduate Optimization problems, is tipically used as a systematic procedure for identifying global extrema. In doing so, some undergraduate textbooks on the subject show the statements related to the LMM (and/or the worked problems based on it) without complete hypotheses or carelessly written in an imprecise manner and with oversimplification (as an attempt to make the method more palatable). Specifically, the validity of the LMM, when one is looking for global extrema, depends on the existence of those extrema. This basic assumption has to be satisfied beforehand. Otherwise, even if one succeeded in obtaining local extrema from the critical points determined by the method, it might not be possible to get the global extrema from the local ones. However, it is not unusual to find books or academic homepages where, right after the local extrema are found, they are promptly evaluated and the ones with the smallest (respectively, the greatest) image under the function are elected the global minima (respectively, maxima). This is done even without having been previously established the existence of the global extrema. The problem gets worse when there is just one critical point and one has to determine the maximum/minimum value from it without having a local criterion to be used in conjuction with the LMM. We present such a criterion here and propose the adoption of it in Calculus courses. At least, such a procedure makes the student completely sure that the optimal local values can be determined from the critical points. Concerning the global aspect, well, this is a whole different story. Many times, without compactness, the proof that there are global extrema is something out of the scope of Calculus courses and depends on the specific problem that we are dealing with. On the other hand, even in the very few books which correctly state the LMM, such as the excellent [3], emphasizing only the local aspect of the method, when working on problems with lack of compactness, it is assumed, for instance, the existence of “a box of largest possible volume” among all rectangular boxes with a fixed surface area. Then, right after finding a unique critical point via the LMM, the solution ends with a conclusion like this one: “This (cubical) shape must therefore maximize the volume, assuming there is a box of maximum volume.” In this paper we work on a similar problem, showing the existence of the global extrema via a nontrivial reasoning for Calculus courses. Furthermore, we emphasize that a criterion to determine if a critical point (obtained by the LMM) is a local maximum/minimum would help enormously in problems like those ones.

2 Preliminaries

We consider here the equality constrained optimization problem of the form

[TABLE]

where $f:D\to\mathbb{R}$ and $g:D\to\mathbb{R}^{m}$ are twice continuously differentiable functions defined on the open set $D\subset\mathbb{R}^{n}$ . The set

[TABLE]

is called feasible set of the problem (1). Recall the well known definition of a minimizer:

Definition 2.1

A point $x^{*}\in\Omega$ is said to be a local minimizer of the problem (1) if there exists $\delta>0$ such that $f(x^{*})\leq f(x)$ for all $x\in\Omega\cap B(x^{*},\delta)$ . If $f(x^{*})\leq f(x)$ for all $x\in\Omega$ , the point $x^{*}$ is called a global minimizer.

Remark 2.2

There is no loss of generality in considering only minimization problems since if we want to maximize a function $f$ , we can equivalently minimize $-f$ . So, the definitions and results can be easily rewritten.**

In the following example we can directly verify that a point is a minimizer. Nevertheless, this is not always the case and we normally need other tools to find minimizers. It should be pointed out that, in spite of focusing on constrained problems, some examples can be better understood and/or visualized if we disregard the constraint. In fact, we can transform an unconstrained problem into an equivalent constrained one by introducing an artificial variable:

[TABLE]

Example 2.3

Consider the function $f:\mathbb{R}^{2}\to\mathbb{R}$ given by $f(x)=x_{1}^{2}+x_{2}^{2}(1-x_{1})^{3}$ . The point $x^{*}=0$ is a local minimizer since $f(x^{*})=0\leq f(x)$ for all $x\in B(x^{*},1)$ . Moreover, this point is not a global minimizer because $f(4,1)=-11$ . See Figure 1.

A well known condition that ensures the existence of a global minimizer is the compactness of the set.

Theorem 2.4

Let $L\subset\Omega$ a compact set. Then $f$ has a global minimizer in $L$ , that is, there exists $x^{*}\in L$ such that $f(x^{*})\leq f(x)$ for all $x\in L$ .

Unfortunately, there are many situations where the above result cannot be directly applied since the underlying set might not be compact. Even so, we may still guarantee the existence of a global minimizer, as discussed in the upcoming Example 3.3.

3 Necessary optimality conditions: Lagrange multipliers

In this section we present the necessary conditions that must be satisfied by every local minimizer. We also point out two misunderstandings that sometimes arise in calculus courses.

Definition 3.1

A point $x^{*}\in\mathbb{R}^{n}$ is said to be critical (or stationary) for the problem (1) if there exists a vector $\lambda^{*}\in\mathbb{R}^{m}$ such that

[TABLE]

The components of $\lambda^{*}$ are the Lagrange multipliers associated with the constraints.

The next result is a classical one and it is used to find possible candidates for the optimal solutions. (See, for instance, [1] for a version of such theorem.)

Theorem 3.2

Suppose that $x^{*}\in\mathbb{R}^{n}$ is a local minimizer for the problem (1) and the gradients $\nabla g_{i}(x^{*})$ are linearly independent, $i=1,\ldots,m$ . Then $x^{*}$ is a critical point for this problem.

As previously mentioned, a very common exercise in undergraduate Calculus is the problem of minimizing the area of a box, without lid, subject to a constant volume. The issue here rely on the fact that, normally, the solution is not accompanied by a mathematical argument explaining that the box with minimum area does exist and/or with a justification as to why the critical point obtained (via the equations (3a)–(3b)) is the global minimizer of the problem. Let us discuss these issues more precisely in the next example.

Example 3.3

Consider $D=\{x\in\mathbb{R}^{3}\mid x_{1}>0,x_{2}>0,x_{3}>0\}$ and the functions $f,g:D\to\mathbb{R}$ defined by $f(x)=x_{1}x_{2}+2x_{1}x_{3}+2x_{2}x_{3}$ and $g(x)=x_{1}x_{2}x_{3}-1$ . Show that the problem (1) has a (unique) global solution and find it using the Lagrange method.

Resolution. As defined before, let $\Omega=\{x\in D\mid g(x)=0\}$ be the feasible set of the problem. We claim that if $x\in\Omega$ and $f(x)\leq 5$ , then $\frac{1}{5}\leq x_{i}\leq 5$ , for $i=1,2,3$ . Indeed, if $x_{1}<\frac{1}{5}$ or $x_{2}<\frac{1}{5}$ , then

[TABLE]

respectively. If $x_{3}<\frac{1}{5}$ , then $f(x)=\dfrac{1}{x_{3}}+2x_{1}x_{3}+2x_{2}x_{3}>\dfrac{1}{x_{3}}>5$ . Furthermore, if $x_{1}>5$ or $x_{2}>5$ ,

[TABLE]

If $x_{3}>5$ , then

[TABLE]

proving the claim. Now, consider the set $L=\{x\in\Omega\mid f(x)\leq 5\}$ . If $(x^{k})\subset L$ is such that $x^{k}\to\bar{x}$ , then $\bar{x}\in D$ , $g(\bar{x})=0$ and $f(\bar{x})\leq 5$ , which means that $L$ is closed. It is also bounded in view of the claim. Thus, Theorem 2.4 ensures that there exists $x^{*}\in L$ such that $f(x^{*})\leq f(x)$ for all $x\in L$ . This point is in fact a global minimizer in $\Omega$ , because if $x\in\Omega\setminus L$ , we have $f(x)>5\geq f(x^{*})$ .

Now, applying Theorem 3.2, we conclude that $x^{*}$ must be solution of the equations

[TABLE]

Since this system has a unique solution, namely, $\dfrac{1}{2}\left(\begin{array}[]{c}2\sqrt[3]{2}\\ 2\sqrt[3]{2}\\ \sqrt[3]{2}\end{array}\right)$ with $\lambda=-2\sqrt[3]{4}$ , it follows that $x^{*}$ is exactly this point.

Concerning the previous example, it is worth mentioning that, on the one hand, its reasoning is out of the scope of undergraduate Calculus textbooks. On the other hand, it is also a remainder that, sometimes, it is not trivial to establish the existence of a global minimum.

The next example is a reformulation of the unconstrained problem given in Example 2.3 as an equivalent constrained problem, obtained by introducing an artificial variable.

Example 3.4

Consider the functions $f,g:\mathbb{R}^{3}\to\mathbb{R}$ given by $f(x)=x_{1}^{2}+x_{2}^{2}(1-x_{1})^{3}$ and $g(x)=x_{3}$ . It is easy to verify that the point $x^{*}=0$ and the multiplier $\lambda^{*}=0$ satisfy the conditions (3a)–(3b). In fact, it can be proved that this point is the only critical point of this example. Moreover, $x^{*}$ is a local minimizer since $f(x^{*})=0\leq f(x)$ for all $x\in B(x^{*},1)$ . However, this point is not a global minimizer because $f(4,1,0)=-11$ .

Remark 3.5

Note that the previous example also answers negatively the question “If a function has a single critical point which is a local minimizer, is this point a global minimizer?”, which sometimes takes place (in some Calculus courses and academic homepages) and is responded incorrectly with a “yes”. This probably occurs since, for functions of one variable, the result holds as the next theorem states.**

Theorem 3.6

Let $f:(a,b)\subset\mathbb{R}\to\mathbb{R}$ be a differentiable function with a single critical point $x^{*}\in(a,b)$ . If $x^{*}$ is a local minimizer, then it is a global minimizer.

Proof. Assume by contradiction that there exists $\bar{x}\in(a,b)$ , say, $\bar{x}>x^{*}$ , such that $f(\bar{x})<f(x^{*})$ . Since $x^{*}$ is a local minimizer, there exists $\delta>0$ such that $f(x^{*})\leq f(x)$ for all $x\in(x^{*}-\delta,x^{*}+\delta)$ . In fact, we have $f(x^{*})<f(x)$ for all $x\in(x^{*}-\delta,x^{*}+\delta)\setminus\{x^{*}\}$ , because otherwise there would be another critical point of $f$ , in view of Rolle’s theorem. Consider then $\tilde{x}\in(x^{*},\bar{x})$ with $f(x^{*})<f(\tilde{x})$ . So, the intermediate value theorem guarantees the existence of $\hat{x}\in(\tilde{x},\bar{x})$ with $f(\hat{x})=f(x^{*})$ . Therefore, by the Rolle’s theorem, we conclude that exists a critical point $x^{**}\in(x^{*},\hat{x})$ , contradicting the hypothesis. Figure 2 illustrates this proof.

It is well known that the converse of Theorem 3.2 is not necessarily true. That is, the optimality conditions (3a)–(3b) are not sufficient to ensure that the point is a local minimizer. Indeed, these conditions are also satisfied at a maximizer. When dealing with unconstrained minimization in two variables, we have the famous sufficient condition, present in almost all textbooks on the subject, to ensure that a critical point $x^{*}$ is a local minimizer, namely, the test of second derivatives

[TABLE]

However, it is not so common to discuss a test for constrained optimization. In the next remark we address another issue related to this subject.

Remark 3.7

In the context of problem (1), it is also typical the following question: “Among the critical points, is the one with the smallest image (under the function) a local minimizer?”. Again, the answer is no and the example below shows why.**

Example 3.8

Consider the problem (1) with the functions $f,g:\mathbb{R}^{2}\to\mathbb{R}$ defined by

[TABLE]

and $g(x)=x_{2}$ . Find the critical points, its images and say which one is a minimizer.

Resolution. The condition (3a) in this case is

[TABLE]

yielding $\lambda^{*}=0$ , $\bar{x}_{1}=0$ , $\hat{x}_{1}=1$ , $x_{1}^{*}=\frac{3}{2}$ and $\tilde{x}_{1}=3$ . So, we have four critical points

[TABLE]

By restricting the objective function to the feasible set, that is, to the points of the form $x=\left(\begin{array}[]{r}t\\ 0\end{array}\right)$ , with $t\in\mathbb{R}$ , we obtain $f(x)=\varphi(t)$ , where

[TABLE]

Since $\varphi^{\prime}(t)=\frac{1}{2}t^{2}(t-3)^{2}(t-1)(2t-3)$ , we conclude that $\bar{x}$ is neither maximizer nor minimizer, but a saddle point. The same is true for $\tilde{x}$ . On the other hand, $\hat{x}$ is a local maximizer and $x^{*}$ is a local minimizer. Finally, the critical values are

[TABLE]

It should be noted that the smallest critical value does not correspond to a local minimizer and that the greatest critical value does not correspond to a local maximizer. Figure 3 illustrates this example.

The next section is devoted to discuss sufficient conditions to ensure optimality for constrained optimization problems.

4 Sufficient optimality conditions

In this section we present a criterion, based on the second derivatives, for attesting that a critical point is a local minimizer. For completeness we present first a general result, well known in the optimization community. Then, we particularize the test to the specific cases studied in Calculus.

We stress that the criteria we will present are only local conditions and do not say anything about global minimization without additional assumptions or specific situations.

To simplify the presentation consider the Lagrangian function associated with the problem (1),

[TABLE]

The Lagrangian Hessian, that is, the matrix of the second derivatives of $\ell$ with respect to $x$ , is denoted by

[TABLE]

The result below can be found in many optimization books. See, for example, [2, 4].

Theorem 4.1

Let $x^{*}\in\mathbb{R}^{n}$ be a critical point for the problem (1) and $\lambda^{*}\in\mathbb{R}^{m}$ a corresponding multiplier vector, according to Definition 3.1. Suppose that

[TABLE]

for all nonzero vectors $d\in\mathbb{R}^{n}$ satisfying $\nabla g_{i}(x^{*})^{T}d=0$ , $i=1,\ldots,m$ . Then there exist $\delta>0$ and a neighborhood $V$ of $x^{*}$ such that

[TABLE]

for all $x\in V$ with $g(x)=0$ . In particular, $x^{*}$ is a strict local minimizer of (1).

Despite the existence of this condition for general dimensions, we consider here the particular $2$ and $3$ -dimensional cases with one or two constraints, which are the most common cases in the Calculus courses. In these situations, the Hessian matrices of a function $\varphi$ are

[TABLE]

if $n=2$ or $n=3$ , respectively.

4.1 The two variables and one constraint case

Consider the problem (1) with $n=2$ and $m=1$ . That is, the problem of minimizing a function of two variables subject to a single equality constraint.

The next theorem follows immediately from the previous one.

Theorem 4.2

Let $x^{*}\in\mathbb{R}^{2}$ be a critical point for the problem (1) and let $\lambda^{*}\in\mathbb{R}$ be the corresponding Lagrange multiplier. Define $H=\nabla^{2}f(x^{*})+\lambda^{*}\nabla^{2}g(x^{*})$ , assume that $\nabla g(x^{*})\neq 0$ and take a nonzero vector $v\perp\nabla g(x^{*})$ . If

[TABLE]

then $x^{*}$ is a local minimizer for the problem.

Now, let us see a straighforward application of the previous theorem.

Example 4.3

Discuss the problem (1) with $f,g:\mathbb{R}^{2}\to\mathbb{R}$ defined by $f(x)=x_{2}-x_{1}^{3}+x_{1}$ and $g(x)=x_{2}-x_{1}^{2}$ .

Resolution. The condition (3a) in this case is

[TABLE]

giving $\lambda^{*}=-1$ and $x_{1}=-\dfrac{1}{3}$ or $x_{1}=1$ . So, we have two critical points

[TABLE]

Moreover,

[TABLE]

Thus, $x^{*}$ is a local minimizer since $v^{T}Hv=2-6x_{1}^{*}>0$ . Note that this point is not a global minimizer because $f\left(\begin{array}[]{r}2\\ 4\end{array}\right)=-2<-\dfrac{5}{27}=f(x^{*})$ . Figure 4 illustrates this example.

Remark 4.4

In the context of Theorem 4.2 we also have the maximization condition. If $v^{T}Hv<0$ , then $x^{*}$ is a local maximizer of $f(x)$ subject to $g(x)=0$ .**

4.2 The three variables and one constraint case

Consider the problem of minimizing a function of three variables subject to a single equality constraint.

Here comes another simple application of the general theorem:

Theorem 4.5

Let $x^{*}\in\mathbb{R}^{3}$ be a critical point for the problem (1) with $n=3$ and $m=1$ . Consider $\lambda^{*}\in\mathbb{R}$ the corresponding Lagrange multiplier and define $H=\nabla^{2}f(x^{*})+\lambda^{*}\nabla^{2}g(x^{*})$ . Suppose that $\nabla g(x^{*})\neq 0$ and take vectors $v_{1},v_{2}\in\mathbb{R}^{3}$ such that ${\rm span}\{v_{1},v_{2}\}=\nabla g(x^{*})^{\perp}$ . Consider the matrices $V=(v_{1}\ v_{2})\in\mathbb{R}^{3\times 2}$ and $A=V^{T}HV\in\mathbb{R}^{2\times 2}$ . If

[TABLE]

then $x^{*}$ is a local minimizer for the problem.

Let us see now a straighforward application of the previous theorem.

Example 4.6

Let us revisit Example 3.3. Suppose we have not proved the existence of a global minimizer. Then, we are able to establish (only) the local condition.

Resolution. We have the critical point

[TABLE]

with multiplier $\lambda^{*}=-2\sqrt[3]{4}$ . Thus,

[TABLE]

We can consider, for example, $\nabla g(x^{*})^{\perp}={\rm span}\left\{\left(\begin{array}[]{r}1\\ -1\\ 0\end{array}\right),\left(\begin{array}[]{r}2\\ 0\\ -1\end{array}\right)\right\},$ yielding

[TABLE]

Hence, $x^{*}$ is a local minimizer since $a_{11}=2>0$ and $\det(A)=12>0$ .

4.3 The three variables and two constraints case

Consider the problem of minimizing a function of three variables subject to two equality constraints.

Finally, let us present our last application of the general theorem.

Theorem 4.7

Let $x^{*}\in\mathbb{R}^{3}$ be a critical point for the problem (1) with $n=3$ and $m=2$ . Consider $\lambda^{*}\in\mathbb{R}^{2}$ the vector of Lagrange multipliers and define $H=\nabla^{2}f(x^{*})+\lambda_{1}^{*}\nabla^{2}g_{1}(x^{*})+\lambda_{2}^{*}\nabla^{2}g_{2}(x^{*})$ . Suppose that $\nabla g_{1}(x^{*})$ and $\nabla g_{2}(x^{*})$ are linearly independent and take a nonzero vector $v\in\mathbb{R}^{3}$ such that $v\perp\nabla g_{1}(x^{*})$ and $v\perp\nabla g_{2}(x^{*})$ . If

[TABLE]

then $x^{*}$ is a local minimizer for the problem.

Here comes our last example:

Example 4.8

Consider $f,g_{1},g_{2}:\mathbb{R}^{3}\to\mathbb{R}$ defined by $f(x)=x_{3}$ , $g_{1}(x)=x_{1}^{2}+x_{2}^{2}-x_{3}^{2}$ and $g_{2}(x)=x_{1}+x_{3}-2$ . Solve the problem (1) for these functions.

Resolution. The condition (3a) in this case is

[TABLE]

which immediately implies that $\lambda^{*}\neq 0$ and hence, $x_{2}^{*}=0$ . Using the constraints, we conclude that $x_{1}^{*}=x_{3}^{*}=1$ . This in turn implies that $\lambda^{*}=\dfrac{1}{4}$ and $\mu^{*}=-\dfrac{1}{2}$ . So,

[TABLE]

We can consider, for example,

[TABLE]

and see that $v^{T}Hv>0$ , proving then that $x^{*}$ is a local minimizer for the problem. In fact, we can prove that this point is a global minimizer. To see this, note that

[TABLE]

which gives $4(x_{3}-1)=x_{2}^{2}\geq 0$ . So, any feasible point $x$ satisfies $f(x)=x_{3}\geq 1=f(x^{*})$ . Figure 5 illustrates the feasible set of this example.

Remark 4.9

It is easy to see that the condition (6) does not depend on the particular choice of $v$ : if $v^{T}Hv>0$ for a vector $v\in\left\{\nabla g_{1}(x^{*}),\nabla g_{2}(x^{*})\right\}^{\perp}$ , then $\bar{v}^{T}H\bar{v}>0$ for any other nonzero vector $\bar{v}\in\left\{\nabla g_{1}(x^{*}),\nabla g_{2}(x^{*})\right\}^{\perp}$ . Indeed, in this case, $\bar{v}=\alpha v$ , for some $\alpha\in\mathbb{R}\setminus\{0\}$ . The same reasoning is true for condition (5). It can be also proved that the conditions in Theorem 4.5 do not depend on the choice of vectors $v_{1},v_{2}$ such that ${\rm span}\{v_{1},v_{2}\}=\nabla g(x^{*})^{\perp}$ .**

5 Conclusion

In this paper, we have pointed out that, in some examples (of some undergraduate Calculus textbooks) related to the acquirement of global minimizers via the Lagrange Multiplier Method (LMM), a little bit of imprecision has been typical, particularly when dealing with worked problems. One way to mitigate that would be the use of a criterion to guarantee when a critical point (obtained by the LMM) is a local minimizer. So, we have proposed such a criterion, which, by the way, has been kept absent from Calculus textbooks. On the other hand, for those Professors who jump into the ‘global’ aspects of the LMM, in spite of being a strictly local result, based on what we discussed here, we also propose the following way to state the LMM:

For continuously differentiable functions, $f$ and $g$ , in order to determine the minimum value of $f$ subject to the constraint $g=k$ with $k$ constant, assuming that this global minimum value is attained on the interior of the domain shared by $f$ and $g$ , but not on the boundary of it, and that $\nabla g\neq\vec{0}$ holds for that domain, do the following:

Determine each point and, if necessary, also $\lambda$ , satisfying the following system:

(a)

$\nabla f=\lambda\nabla g$ ; 2. (b)

g=k. 2. 2.

Evaluate $f$ for those points obtained in the previous item: the smallest value of $f$ is its global minimum.

Bibliography4

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. M. Fitzpatrick. Advanced Calculus . Thomson Brooks/Cole, Belmont, USA, 2nd edition, 2006.
2[2] D. G. Luenberger and Y. Ye. Linear and Nonlinear Programming . Springer, New York, 3rd edition, 2008.
3[3] J. E. Marsden and A. Tromba. Vector Calculus . W. H. Freeman and Company, New York, 6th edition, 2012.
4[4] J. Nocedal and S. J. Wright. Numerical Optimization . Springer-Verlag, New York, 2nd edition, 2006.