Theory and Application of the KKT Conditions in Linear Programming

I present some theory behind the KKT conditions and present a linear programming application solved using the KKT conditions.

Sometimes understanding something theoretical requires conjuring an example and asking what tools one would use to solve the problem. Recently, I was thinking about a linear programming problem and thinking about how one would use the KKT conditions to solve the problem. I knew before that the KKT conditions were a set of conditions that magically help one find the optimal values of an optimization problem. However, I did not understand the deeper reason for why satisfying the KKT conditions results in an optimal solution. In this article, I present an application of using the KKT conditions to find an optimal solution for a linear programming problem, and I dive into the theory behind the KKT conditions.

Linear Programming Model of Optimal Levels of Activity

The problem that I investigate relates to choosing the optimal levels of activity over many periods for a company. Let the total number of periods be denoted as NN and suppose a company has nn departments. Let ai(t)0a_{i}(t) \geq 0 be the activity level of department ii at period t=1,,Nt=1, \ldots, N. The revenue that department ii generates per unit of activity is denoted as rir_{i}. The cost that department ii incurs per unit of activity is denoted cic_{i}. The total amount of revenue in period tt is rTa(t)Rr^{T} a(t) \in \mathbb{R} and the total cost in period tt is cTa(t)Rc^{T} a(t) \in \mathbb{R}. The company has bb amount of money at period 1.

There are several constraints in our problem which I address now. First, the cost that the company incurs in any period cannot exceed the revenue that the company generates in the previous period. Further, assume that the company cannot use current revenue to pay current costs and previous profits to pay current costs. Hence, for all periods t=1,,N1t=1, \ldots, N-1, I enforce the constraint cTa(t+1)rTa(t)0c^{T} a(t+1)-r^{T} a(t) \leq 0, and, for the first period, I enforce the constraint cTa(1)b0c^{T} a(1)-b \leq 0. Note that there is no constraint on the last period.

The objective is to choose a(t)a(t) for all t=1,,Nt=1, \ldots, N in order to maximize discounted total profits, where δ(0,1)\delta \in(0,1) is the discount factor. The objective function is

bcTa(1)+i=1N1δi(rTa(i)cTa(i+1))+δNrTa(N). \begin{gathered} b-c^{T} a(1)+\sum_{i=1}^{N-1} \delta^{i}\left(r^{T} a(i)-c^{T} a(i+1)\right)+\delta^{N} r^{T} a(N). \end{gathered}

To summarize, the linear programming problem is as follows.

 maximize bcTa(1)+i=1N1δi(rTa(i)cTa(i+1))+δNrTa(N) s.t. a(t)0,t=1,,NrTa(i)cTa(i+1)0,t=1,,N1bcTa(1)0 \begin{aligned} & \text { maximize } b-c^{T} a(1)+\sum_{i=1}^{N-1} \delta^{i}\left(r^{T} a(i)-c^{T} a(i+1)\right)+\delta^{N} r^{T} a(N) \text { s.t. } \\ & a(t) \geq 0, t=1, \ldots, N \\ & r^{T} a(i)-c^{T} a(i+1) \geq 0, t=1, \ldots, N-1 \\ & b-c^{T} a(1) \geq 0 \end{aligned}

The Theory of KKT Conditions in Linear Programming

There are many ways to solve the linear programming problem. One such way is to use the KKT conditions. Another way is to use the simplex method. However, since the KKT conditions are generalizable to nonlinear optimization problems, I focus on studying the KKT conditions.

Before I show you an example of what the results of this problem might look like, I would like to make a few remarks on why the KKT conditions are important. Since my problem is just a complicated linear programming problem, I investigate a simpler linear programming model. Consider the classical linear programming problem with the primal defined as maximize cTxc^{T} x subject to Axb,x0A x \leq b, x \geq 0 and the dual defined as minimize bTyb^{T} y subject to ATyc,y0A^{T} y \geq c, y \geq 0. For any optimal primal solution xx^{*} and optimal dual solution yy^{*}, if cTx=bTyc^{T} x^{*}=b^{T} y^{*}, then the primal and the dual problem achieve strong duality.

In general, we would like to have strong duality since that would mean that the primal has attained the highest possible value, and thus has reached an optimal solution. In order to have strong duality, the solution to the primal and the dual must satisfy the KKT conditions. Then using the KKT theorem, it can be concluded that the solution is optimal. This is why I consider solutions to the KKT conditions.

Proposition. KKT Theorem for Linear Programming Problems\textbf{Proposition. KKT Theorem for Linear Programming Problems}

(Sufficiency) Suppose (x,y)\left(x^{*}, y^{*}\right) are the optimal primal and dual solutions. If (x,y)\left(x^{*}, y^{*}\right) satisfies the KKT conditions, then the primal and dual problem achieve strong duality.

(Necessary) Suppose (x,y)\left(x^{*}, y^{*}\right) are the optimal primal and dual solutions. If the primal and dual problem achieve strong duality, then (x,y)\left(x^{*}, y^{*}\right) satisfies the KKT conditions.

Proof.\textit{Proof.} I give a proof of the necessary condition. Suppose (x,y)\left(x^{*}, y^{*}\right) are the optimal primal and dual solutions. This implies that xx^{*} satisfies all inequality constraints AxbA x^{*} \leq b. This is the primal feasibility condition. Further, y0y^{*} \geq 0 since yy^{*} is a solution to the dual problem. This is the dual feasibility condition. Suppose further that strong duality is satisfied. This means

cTx=(y)TAx=(y)Tb. c^{T} x^{*}=(y^{*})^T A x^{*}=(y^{*})^T b.

Hence, the above implies that (y)Tb(y)TAx=(y)T(bAx)=0(y^{*})^T b-(y^{*})^T A x^{*}=(y^{*})^T\left(b-A x^{*}\right)=0. This implies that for each vector component ii, either (bAx)i=\left(b-A x^{*}\right)_{i}= 0 or yi=0y_{i}^{*}=0 since AxbA x^{*} \leq b and y0y^{*} \geq 0. Likewise, for each component ii, either (cATy)i=0\left(c-A^T y^{*}\right)_{i}=0 or xi=0x_{i}^{*}=0. This is the complementary slackness condition. Further, the result (cATy)i=0\left(c-A^T y^{*}\right)_{i}=0 is just the stationarity condition in disguise since (cTxxyTAxxy=y)i=(cATy)i=0\left(\frac{\partial c^{T} x}{\partial x}-\left.\frac{\partial y^{T} A x}{\partial x}\right|_{y=y^{*}}\right)_{i}=\left(c-A^T y^{*}\right)_{i}=0. Hence, the stationarity condition in linear programming problems is implied by the complementary slackness condition.

Sometimes, I see the KKT theorem applied incorrectly. For example, see this StackExchange post. The claim is that if (x,y)\left(x^{*}, y^{*}\right) are the optimal primal and dual solutions, then they must satisfy the KKT conditions. This claim is false for the reason stated below.

Proposition. OptimalityKKT Conditions\textbf{Proposition. Optimality} \nRightarrow \textbf{KKT Conditions}

Suppose (x,y)\left(x^{*}, y^{*}\right) are the optimal primal and dual solutions. They do not need to satisfy the KKT conditions.

Proof.\textit{Proof.} Consider the problem with primal

maximize2x1+3x2 s.t. x1+x242x1x22x0 \begin{aligned} & \operatorname{maximize} 2 x_{1}+3 x_{2} \text { s.t. } \\ & x_{1}+x_{2} \leq 4 \\ & 2 x_{1}-x_{2} \leq 2 \\ & x \geq 0 \end{aligned}

and the dual

minimize4y1+2y2 s.t. y1+2y22y1y23y0. \begin{aligned} & \operatorname{minimize} 4 y_{1}+2 y_{2} \text { s.t. } \\ & y_{1}+2 y_{2} \geq 2 \\ & y_{1}-y_{2} \geq 3 \\ & y \geq 0. \end{aligned}

The optimal solution for the primal is (x1,x2)=(2,2)\left(x_{1}, x_{2}\right)=(2,2) and the optimal solution for the dual is (y1,y2)=(3,0)\left(y_{1}, y_{2}\right)=(3,0), but the KKT conditions does not hold since the stationarity condition is violated. This is because the stationarity condition is

2+y1+2y2=0 and 3+y1y2=0 -2+y_{1}+2 y_{2}=0 \text { and }-3+y_{1}-y_{2}=0

which is only true if (y1,y2)=(83,13)\left(y_{1}, y_{2}\right)=\left(\frac{8}{3},-\frac{1}{3}\right). Hence, in this case, the maximum value attainable by the primal is 10 and the lowest value attainable by the dual is 12. The maximum value can never reach the lower bound. It is important to note though that if a point (x,y)(x, y) satisfies the KKT conditions, then that point is an optimal solution.

Likewise, in nonlinear problems, sometimes the KKT conditions cannot help in finding the optimal solution. For example, consider the example max xx s.t. x2x0x^{2}-x \leq 0. The stationarity combined with the complementary slackness conditions cannot both be satisfied. Hence, the KKT theorem tells us that strong duality will not hold in this case.

Finally, there is another interesting theorem on this topic. I define a linear programming problem to be feasible if there is an element in Rn\mathbb{R}^{n} that satisfies the constraints of the problem. A problem is infeasible if there is no such element.

Theorem. Duality Theorem\textbf{Theorem. Duality Theorem}

If the dual problem is infeasible, then the primal problem is either unbounded or infeasible.

This theorem is useful for checking whether a primal problem is bounded since we only need to check whether the dual and primal problems are feasible to determine if they both have solutions. For example, if the dual problem is infeasible and the primal problem is feasible, then it must mean that the primal problem is unbounded. If both the primal and dual are feasible, then they both must have solutions. For an example of the latter, look at the optimization problem in the proposition above. For an example of the former, consider the following. The primal is

 maximize x1+x2 s.t. x1x21x0 \begin{aligned} & \text { maximize } x_{1}+x_{2} \text { s.t. } \\ & x_{1}-x_{2} \leq 1 \\ & x \geq 0 \end{aligned}

and dual is

minimizey1 s.t. y11y11y0. \begin{aligned} & \operatorname{minimize} y_{1} \text { s.t. } \\ & y_{1} \geq 1 \\ & -y_{1} \geq 1 \\ & y \geq 0. \end{aligned}

KKT Conditions to Solve for Optimal Levels of Activity

I now return to the application of the KKT conditions. For the analysis below, I denote aija_{i j} to be the ii component of a(j)a(j). The stationary condition is as follows. For any department k=1,,nk=1, \ldots, n,

ck+δrk(λk1+μ1(rk)+v1(ck))=0δj1ck+δjrk(λkj+μj1(ck)+μj(rk))=0 for all j=2,,N1, and δN1ck+δNrk(λkN+μN1(ck))=0. \begin{gathered} -c_{k}+\delta r_{k}-\left(-\lambda_{k 1}+\mu_{1}\left(-r_{k}\right)+v_{1}\left(c_{k}\right)\right)=0 \\ -\delta^{j-1} c_{k}+\delta^{j} r_{k}-\left(-\lambda_{k j}+\mu_{j-1}\left(c_{k}\right)+\mu_{j}\left(-r_{k}\right)\right)=0 \text { for all } j=2, \ldots, N-1, \text { and } \\ -\delta^{N-1} c_{k}+\delta^{N} r_{k}-\left(-\lambda_{k N}+\mu_{N-1}\left(c_{k}\right)\right)=0. \end{gathered}

The primal feasibility condition is

aij0 for all i=1,,n and j=1,,N,(r1a1i++rnani)+(c1a1,i+1++cnan,i+1)0 for all i=1,,N1, and b+(c1a11++cnan1)0. \begin{gathered} -a_{i j} \leq 0 \text { for all } i=1, \ldots, n \text { and } j=1, \ldots, N, \\ -\left(r_{1} a_{1 i}+\cdots+r_{n} a_{n i}\right)+\left(c_{1} a_{1, i+1}+\cdots+c_{n} a_{n, i+1}\right) \leq 0 \text { for all } i=1, \ldots, N-1, \text { and } \\ -b+\left(c_{1} a_{11}+\cdots+c_{n} a_{n 1}\right) \leq 0. \end{gathered}

The dual feasibility condition is

λkj0 for all k=1,,n and j=1,,N,μi0 for all i=1,,N1, and v10. \begin{gathered} \lambda_{k j} \geq 0 \text { for all } k=1, \ldots, n \text { and } j=1, \ldots, N, \\ \mu_{i} \geq 0 \text { for all } i=1, \ldots, N-1, \text { and } \\ v_{1} \geq 0. \end{gathered}

The complementary slackness condition is

λkjaij=0 for all i=1,,n and j=1,,N,μi((r1a1i++rnani)+(c1a1,i+1++cnan,i+1))=0 for all i=1,,N1, and v1(b+(c1a11++cnan1))=0. \begin{gathered} -\lambda_{k j} a_{i j}=0 \text { for all } i=1, \ldots, n \text { and } j=1, \ldots, N, \\ \mu_{i}\left(-\left(r_{1} a_{1 i}+\cdots+r_{n} a_{n i}\right)+\left(c_{1} a_{1, i+1}+\cdots+c_{n} a_{n, i+1}\right)\right)=0 \text { for all } i=1, \ldots, N-1, \text { and } \\ v_{1}\left(-b+\left(c_{1} a_{11}+\cdots+c_{n} a_{n 1}\right)\right)=0. \end{gathered}

I wrote Mathematica code to analyze the optimal activity levels for two departments and two periods which you can download here.

Appendix

In general, the objective function needs to be concave in order to find a maximum. Hence, the lemma below is somewhat important.

Lemma.\textbf{Lemma.} The functions fi(a(i),a(i+1)):=rTa(i)cTa(i+1)f_{i}(a(i), a(i+1)):=r^{T} a(i)-c^{T} a(i+1) for all i=1,,N1i=1, \ldots, N-1 and bcTa(1)b-c^{T} a(1) are convex and the objective function is concave.

Proof. The functions fif_{i} are all convex by induction since

fj(λ(aj,aj+1)+(1λ)(bj,bj+1))=λrTaj+(1λ)rTbjλcTaj+1(1λ)cTbj+1 f_{j}\left(\lambda\left(a_{j}, a_{j+1}\right)+(1-\lambda)\left(b_{j}, b_{j+1}\right)\right)=\lambda r^{T} a_{j}+(1-\lambda) r^{T} b_{j}-\lambda c^{T} a_{j+1}-(1-\lambda) c^{T} b_{j+1}

and

λfj(aj,aj+1)+(1λ)fj(bj,bj+1)=λrTajλcTaj+1+(1λ)rTbj(1λ)cTbj+1. \lambda f_{j}\left(a_{j}, a_{j+1}\right)+(1-\lambda) f_{j}\left(b_{j}, b_{j+1}\right)=\lambda r^{T} a_{j}-\lambda c^{T} a_{j+1}+(1-\lambda) r^{T} b_{j}-(1-\lambda) c^{T} b_{j+1}.

Hence, fj(λ(aj,aj+1)+(1λ)(bj,bj+1))=λfj(aj,aj+1)+(1λ)fj(bj,bj+1)f_{j}\left(\lambda\left(a_{j}, a_{j+1}\right)+(1-\lambda)\left(b_{j}, b_{j+1}\right)\right)=\lambda f_{j}\left(a_{j}, a_{j+1}\right)+(1-\lambda) f_{j}\left(b_{j}, b_{j+1}\right) for all a,b0a, b \geq 0, λ[0,1]\lambda \in[0,1], and j=1,,Nj=1, \ldots, N. Likewise, WLOG, bcTa(1)b-c^{T} a(1) is also convex. The above also proves that the functions fif_{i} are concave and that bcTa(1)b-c^{T} a(1) is concave. Since the sum of concave functions is concave, the objective function is concave.