I present some theory behind the KKT conditions and present a linear programming application solved using the KKT conditions.
Sometimes understanding something theoretical requires conjuring an example and asking what tools one would use to solve the problem. Recently, I was thinking about a linear programming problem and thinking about how one would use the KKT conditions to solve the problem. I knew before that the KKT conditions were a set of conditions that magically help one find the optimal values of an optimization problem. However, I did not understand the deeper reason for why satisfying the KKT conditions results in an optimal solution. In this article, I present an application of using the KKT conditions to find an optimal solution for a linear programming problem, and I dive into the theory behind the KKT conditions.
Linear Programming Model of Optimal Levels of Activity
The problem that I investigate relates to choosing the optimal levels of activity over many periods for a company. Let the total number of periods be denoted as N and suppose a company has n departments. Let ai(t)≥0 be the activity level of department i at period t=1,…,N. The revenue that department i generates per unit of activity is denoted as ri. The cost that department i incurs per unit of activity is denoted ci. The total amount of revenue in period t is rTa(t)∈R and the total cost in period t is cTa(t)∈R. The company has b amount of money at period 1.
There are several constraints in our problem which I address now. First, the cost that the company incurs in any period cannot exceed the revenue that the company generates in the previous period. Further, assume that the company cannot use current revenue to pay current costs and previous profits to pay current costs. Hence, for all periods t=1,…,N−1, I enforce the constraint cTa(t+1)−rTa(t)≤0, and, for the first period, I enforce the constraint cTa(1)−b≤0. Note that there is no constraint on the last period.
The objective is to choose a(t) for all t=1,…,N in order to maximize discounted total profits, where δ∈(0,1) is the discount factor. The objective function is
b−cTa(1)+i=1∑N−1δi(rTa(i)−cTa(i+1))+δNrTa(N).
To summarize, the linear programming problem is as follows.
maximize b−cTa(1)+i=1∑N−1δi(rTa(i)−cTa(i+1))+δNrTa(N) s.t. a(t)≥0,t=1,…,NrTa(i)−cTa(i+1)≥0,t=1,…,N−1b−cTa(1)≥0
The Theory of KKT Conditions in Linear Programming
There are many ways to solve the linear programming problem. One such way is to use the KKT conditions. Another way is to use the simplex method. However, since the KKT conditions are generalizable to nonlinear optimization problems, I focus on studying the KKT conditions.
Before I show you an example of what the results of this problem might look like, I would like to make a few remarks on why the KKT conditions are important. Since my problem is just a complicated linear programming problem, I investigate a simpler linear programming model. Consider the classical linear programming problem with the primal defined as maximize cTx subject to Ax≤b,x≥0 and the dual defined as minimize bTy subject to ATy≥c,y≥0. For any optimal primal solution x∗ and optimal dual solution y∗, if cTx∗=bTy∗, then the primal and the dual problem achieve strong duality.
In general, we would like to have strong duality since that would mean that the primal has attained the highest possible value, and thus has reached an optimal solution. In order to have strong duality, the solution to the primal and the dual must satisfy the KKT conditions. Then using the KKT theorem, it can be concluded that the solution is optimal. This is why I consider solutions to the KKT conditions.
Proposition. KKT Theorem for Linear Programming Problems
(Sufficiency) Suppose (x∗,y∗) are the optimal primal and dual solutions. If (x∗,y∗) satisfies the KKT conditions, then the primal and dual problem achieve strong duality.
(Necessary) Suppose (x∗,y∗) are the optimal primal and dual solutions. If the primal and dual problem achieve strong duality, then (x∗,y∗) satisfies the KKT conditions.
Proof.
I give a proof of the necessary condition. Suppose (x∗,y∗) are the optimal primal and dual solutions. This implies that x∗ satisfies all inequality constraints Ax∗≤b. This is the primal feasibility condition. Further, y∗≥0 since y∗ is a solution to the dual problem. This is the dual feasibility condition. Suppose further that strong duality is satisfied. This means
cTx∗=(y∗)TAx∗=(y∗)Tb.
Hence, the above implies that (y∗)Tb−(y∗)TAx∗=(y∗)T(b−Ax∗)=0. This implies that for each vector component i, either (b−Ax∗)i= 0 or yi∗=0 since Ax∗≤b and y∗≥0. Likewise, for each component i, either (c−ATy∗)i=0 or xi∗=0. This is the complementary slackness condition. Further, the result (c−ATy∗)i=0 is just the stationarity condition in disguise since (∂x∂cTx−∂x∂yTAx∣∣∣∣y=y∗)i=(c−ATy∗)i=0. Hence, the stationarity condition in linear programming problems is implied by the complementary slackness condition.
Sometimes, I see the KKT theorem applied incorrectly. For example, see this StackExchange post. The claim is that if (x∗,y∗) are the optimal primal and dual solutions, then they must satisfy the KKT conditions. This claim is false for the reason stated below.
Proposition. Optimality⇏KKT Conditions
Suppose (x∗,y∗) are the optimal primal and dual solutions. They do not need to satisfy the KKT conditions.
Proof.
Consider the problem with primal
maximize2x1+3x2 s.t. x1+x2≤42x1−x2≤2x≥0
and the dual
minimize4y1+2y2 s.t. y1+2y2≥2y1−y2≥3y≥0.
The optimal solution for the primal is (x1,x2)=(2,2) and the optimal solution for the dual is (y1,y2)=(3,0), but the KKT conditions does not hold since the stationarity condition is violated. This is because the stationarity condition is
−2+y1+2y2=0 and −3+y1−y2=0
which is only true if (y1,y2)=(38,−31). Hence, in this case, the maximum value attainable by the primal is 10 and the lowest value attainable by the dual is 12. The maximum value can never reach the lower bound. It is important to note though that if a point (x,y) satisfies the KKT conditions, then that point is an optimal solution.
Likewise, in nonlinear problems, sometimes the KKT conditions cannot help in finding the optimal solution. For example, consider the example max x s.t. x2−x≤0. The stationarity combined with the complementary slackness conditions cannot both be satisfied. Hence, the KKT theorem tells us that strong duality will not hold in this case.
Finally, there is another interesting theorem on this topic. I define a linear programming problem to be feasible if there is an element in Rn that satisfies the constraints of the problem. A problem is infeasible if there is no such element.
Theorem. Duality Theorem
If the dual problem is infeasible, then the primal problem is either unbounded or infeasible.
This theorem is useful for checking whether a primal problem is bounded since we only need to check whether the dual and primal problems are feasible to determine if they both have solutions. For example, if the dual problem is infeasible and the primal problem is feasible, then it must mean that the primal problem is unbounded. If both the primal and dual are feasible, then they both must have solutions. For an example of the latter, look at the optimization problem in the proposition above. For an example of the former, consider the following. The primal is
maximize x1+x2 s.t. x1−x2≤1x≥0
and dual is
minimizey1 s.t. y1≥1−y1≥1y≥0.
KKT Conditions to Solve for Optimal Levels of Activity
I now return to the application of the KKT conditions. For the analysis below, I denote aij to be the i component of a(j). The stationary condition is as follows. For any department k=1,…,n,
−ck+δrk−(−λk1+μ1(−rk)+v1(ck))=0−δj−1ck+δjrk−(−λkj+μj−1(ck)+μj(−rk))=0 for all j=2,…,N−1, and −δN−1ck+δNrk−(−λkN+μN−1(ck))=0.
The primal feasibility condition is
−aij≤0 for all i=1,…,n and j=1,…,N,−(r1a1i+⋯+rnani)+(c1a1,i+1+⋯+cnan,i+1)≤0 for all i=1,…,N−1, and −b+(c1a11+⋯+cnan1)≤0.
The dual feasibility condition is
λkj≥0 for all k=1,…,n and j=1,…,N,μi≥0 for all i=1,…,N−1, and v1≥0.
The complementary slackness condition is
−λkjaij=0 for all i=1,…,n and j=1,…,N,μi(−(r1a1i+⋯+rnani)+(c1a1,i+1+⋯+cnan,i+1))=0 for all i=1,…,N−1, and v1(−b+(c1a11+⋯+cnan1))=0.
I wrote Mathematica code to analyze the optimal activity levels for two departments and two periods which you can download here.
Appendix
In general, the objective function needs to be concave in order to find a maximum. Hence, the lemma below is somewhat important.
Lemma.
The functions fi(a(i),a(i+1)):=rTa(i)−cTa(i+1) for all i=1,…,N−1 and b−cTa(1) are convex and the objective function is concave.
Proof. The functions fi are all convex by induction since
fj(λ(aj,aj+1)+(1−λ)(bj,bj+1))=λrTaj+(1−λ)rTbj−λcTaj+1−(1−λ)cTbj+1
and
λfj(aj,aj+1)+(1−λ)fj(bj,bj+1)=λrTaj−λcTaj+1+(1−λ)rTbj−(1−λ)cTbj+1.
Hence, fj(λ(aj,aj+1)+(1−λ)(bj,bj+1))=λfj(aj,aj+1)+(1−λ)fj(bj,bj+1) for all a,b≥0, λ∈[0,1], and j=1,…,N. Likewise, WLOG, b−cTa(1) is also convex. The above also proves that the functions fi are concave and that b−cTa(1) is concave. Since the sum of concave functions is concave, the objective function is concave.