Extension of ADMM Algorithm in Solving Optimal Control Model Governed by Partial Differential Equation

This paper presents an Algorithm for the numerical solution of the Optimal Control model constrained by Partial Differential Equation using the Alternating Direction Method of Multipliers (ADMM) accelerated with a parameter factor in the sense of Nesterov. The ADMM tool was applied to a partial differential equation-governed optimization problem of the one-dimensional heat equation type. The constraint and objective functions of the optimal control model were discretized using the Crank-Nicolson and Composite Simpson’s Methods respectively into a derived discrete convex optimization form amenable to the ADMM. The primal-dual residuals were derived to ascertain the rate of convergence of the model for increasing iterates. An existing example was used to test the efﬁciency and degree of accuracy of the algorithm and the results were favorable when compared the existing method.


Introduction
Optimal control problems (OCP) involving partial differential equations (PDE) are derivations arising from solid mechanics, fluid dynamics, engineering, Economics and sciences amongst others.Most of which are formulations into second order performance measures governed by PDE constraints of the ellipticc type.The classical analytic techniques of the Hamilton -Jacobi -Bellman (HJB) or Euler Lagrange(E-L) may proof difficult because of the complexity of the formulated continuous nonlinear models.Betts and Campbell in [1] worked on the "first discretized then optimize" technique, a novel approach in obtaining a discretized optimization problem with finite number of (large) vectors.Ghobadi et al. in [2] extended this approach to the one-dimensional heat equation and enumerated a lot of advantages when compared with the historical "first optimize then discretized" approach.These include (i) the circumvention of the need to establish certain properties of the problem such as the active constraint interval and the number of touch points a priori and (ii) its amenability to well-structured discretization schemes and optimization solvers for inequality constrained problems.The work of [2] was shrouded with conditional stability problem due to the explicit discretization method used.However, this draw-back of conditional stability will be circumvented in this research with the use of the implicit method of higher order in the discretization and the ADMM extended in obtaining the optimal solutions with better accuracy of results and faster rate of convergence.
Moreover, many authors have such as Olotu and Dawodu in ( [3,4,5]) have worked extensively on proportional and multidelay systems using the "first discretized then optimize" approach.However, He and Yuan in [6] and He et.al in [7] worked on block-convex programming problem synonymous to discretized continuous distributed systems, though the implementation on PDE-based optimal control problems were not explored.

Statement of problem
Considering a one-dimensional bar connected to a heating (cooling) device at its both ends with known initial and lower-bound conditions but with unknown boundary conditions, then the temperature functions on the boundaries are considered to be the control variables (u i (t ), i = 1, • • • , q) as in [1].The heat equation is then given as; where f (x, t ) is the temperature at position x and at time t .It is then imperative to have the temperature profile f (x, t ) at each point of the bar that does not go below a given specific temperature function given by g (x, t )(also known as a lower bound function), during the time interval.Setting the temperature of the bar very high makes sure the lower-bound condition (inequality constraint) below is satisfied.
However, the objective of this modeling is to choose an appropriate temperature profile such that the energy consumed (objective function) is at minimum by reducing overheating, that is, to minimize the cost of energy consumed while keeping the temperature of the bar above the lower-bound profile.The appropriate choice of the objective function can be compared to the temperature profile of the bar with the lowerbound profile as closely as possible expressed mathematically as f ((x, t ), y) − g (x, t ) .Assuming the temperature profile is the approximation of the temperature of the bar provided the heat equation and initial condition are satisfied.And suppose the squared values of all the temperature in time t and space x is expressed as f 2 (x, t ), then the objective function to be minimized is the integral of f 2 (x, t ) in time and space.Then the optimization model of the heat-transfer problem can be mathematically expressed as where x ∈ R, u ∈ R and f 0 (x) is a known initial function of temperature at the initial time (t = 0).While u 1 (t ) and u 2 (t ) are the temperatures at the two boundary points, l 1 and l 2 respectively, of the bar whose optimal values will be obtained.

Materials and methods
The continuous non-linear model (problem) formulated above can now be solved using the approach of "first-discretizedthen-optimize" to obtain a quadratic optimization problem with finite number of (large) variables.The implicit discretization methods will be used for the discretization of the continuoustime OCP because its (i) stability is unconditional,(ii) time step-length can be chosen independent of the spatial steplength and (iii) coefficient matrix remains well-conditioned.

Discretization of the Constraints
In the transformation of the continuous time PDE constraints, the second-order Crank-Nicolson method derived from the averaging of the forward Difference method at the j th step in time t and the Backward-Difference at the ( j + 1)th step in t with truncation error h 2 ∂ 2 f ∂t 2 (x i , µ j ) + O(h 2 ) will be used.And given that the grid points are equally-distributed for both the spatial interval (length of bar)[l 1 , l 2 ] and the time interval [0, T ] with step-lengths δ and h respectively, computed as δ = (l 2 − l 1 )/n and h = T /N .Then the number of intervals n from the discretization in space and N from the discretization in time will generate (n +1) and (N +1) number of grid points respectively.
In generating the numerical sequence, let x i = l 1 + i δ be the i th point in space (for i = 0, 1, 2, . . ., n) and t j = j h be the j th point in time (for j = 0, 1, 2, . . ., N ), provided the usual differentiability conditions are satisfied, then the Crank-Nicolson (averaged difference) discretizations scheme in time and space with local truncation error of order O(δ 2 + h 2 ) expressed below as; where f i , j f (x i , t j ) and u k, j u k (t j ) for i = 0, 1, 2, . . ., n, j = 0, 1, 2, . . ., N −1 and k = 1, 2. The discretization of the PDE constraint function using eqns.( 4) and ( 5), given that f (0, t j ) = f 0, j = u 1, j , f (x n , t j ) = f n, j = u 2, j and f (x j , 0) = 0 yields; and collecting like-terms gives the recurrence formula below; Setting i = 1 and f 0, j = u 1, j yields; For j = 0 given f 1,0 = f 2,0 = u 1,0 = 0 yields; For j = 1 yields; When eqn. ( 11) is expanded for i = 1 and 1 ≤ j ≤ N − 1 gives the matrix equation below: compactly written as where T and 0 are column vectors of dimension N ×1 while L and Ĝ are bi-diagonal square-matrices of N × N dimensions with respective entries described below.
Set i = 2; this then gives the recurrence equation below; For j = 0; and equating For j = 2; The matrix equation for eqns.( 17) to ( 20) is then given as,

) 107
For i = 2, 3, • • • , n − 2 and 1 ≤ j ≤ N − 1 will then be expressed below as: Where , L and Ĝ are described in eqns.( 14) and (15) respectively.Combining eqns.( 13), ( 22) and ( 23) gives the matrix formulation below: compactly written as, where C is a block-matrix of dimension (n − 1)N × 2N , D is a square block-matrix of dimension (n − 1)N × (n − 1)N , U is a column vector of dimension 2N × 1 while F and 0 are column vectors of dimension (n − 1)N × 1. u 1 (t j ) and u 2 (t j ) are the control variables for which gives access to the heating (cooling) resources, while the temperature variables f (x i , t j ) of the other parts of the bar are computed based on the heattransfer equations.

Discretization of the Boundaries
Given Therefore, For j = 2; Combining eqns.( 27) to (28) generates the matrix-structure below: Equation ( 30) is compactly written as; where is the known coefficient vector.

Discretization of Objective Function
In the discretization of the continuous-time objective function(performance measure) both in space and time, the composite Simpson's rule and the right-Riemann rule will be used respectively as stated below: Then the discretized objective function of the optimal control problem in eqn.(3) using eqns.(32) and ( 32) is expressed below as; where, Discretizing eqn.(34) using (32) and collecting like-terms gives; where , j , setting the values of j = 1, 2, ..., N in eqn.(38) yields the matrix operator.For j = 1; The matrix formation of the above equations is then written as follows: where, However, the matrices Q ∈ R (n−1)N ×(n−1)N and H ∈ R 2N ×2N in the compactified convex quadratic eqn.(39) are real, symmetric and positivedefinite with the diagonal coefficients described below as follows: and Combining eqns.(25), (31) and (39) gives, in a matrix form, the compact convex quadratic optimization problem with linear constraints as expressed below:- The above re-formulated discrete model is a well-structured, quadratic optimization problem whose objective function is convex and separable.While the constraint function is linear and coupled which made the model amenable to the ADMM.However, the mixed-inequality constraint model is re-formulated into an equality constraint form by introducing non-negative slack variables Z ( 0) into the model.

Stability and Feasibility of the Model
With the assumption that the solution of the PDE satisfies the usual differentiability conditions, the second order Crank Nicholson implicit scheme, with local truncation error of order O(h + δ 2 ), was deplored for the discretization of the PDE constraint to avoid the conditional stability and feasibility associated with explicit discretization methods.This stability of the implicit discretization method reflects in the consistency and statbility of the coefficient matrix operators.However, the composite simpson's scheme, used for the discretization of the objective function, has truncation error of order O(h 4 ) with degree of accuracy (precision) of three.This helps to improve the accuracy and rate of convergence of the model, hence an improvement over the trapezoidal used in [1].

Sparsity of the Model
The very suitable approach for ascertaining the sparsity of the model is to compactify it as studied by Kameswaran and Biegler in [8].The set of optimization eqns.(45) was then re-structured in a compact form as stated below.
where, W = (U F ) The matrices H and Q are the two coefficient sub-matrices of the objective functions while C and D are the two submatrices of the constraint; both correspondent to those of the variables U and F .The sizes of the augmented matrices, showing the number of zero and non-zero entries are expressed in the table 1 below.The sparsity ratio is defined here as the proportion of the zero entries in comparison with the size of the matrix.

Definition 2. (Corollary to Sylvester criterion):
If the principal diagonal entries a i i are the only nonzero entries of a square (n×n) matrix such that a i j = 0 for i j ; then the leading principal minors (M i > 0 The constructed matrix-operator P for the derived quadratic function ( 45) is an augmentation of the real symmetric matrices Q and H in ( 40) and ( 41) respectively whose principal diagonal entries are all strictly positive.Therefore, P is symmetric and positive definite by definitions 2 and 3 since all the leading principal minors are all strictly positive.This property is pertinent to guarantee convexity and minimal solution of the quadratic function.

ADMM Algorithm Formulation
In the formulation of the ADMM, the derived update formulas will be accelerated with an accelerator variant referred to as the relaxation factor.The results will then be extendable to the discretized PDE governed OCP (44).The associated augmented Lagrangian functional is expressed as with the M-ADMM iterations stated thus: where µ ∈ R (n−1)N and λ ∈ R N are Lagrange multipliers, ρ > 0 is the penalty parameter, . 2 denotes the euclidean (spectral) norm of a vector (matrix) argument, ξ 1 = µ/ρ ∈ R (n−1)N and ξ 2 = λ/ρ ∈ R N are the scaled dual variables, Z ∈ R N is the introduced slack vector and l + (Z ) is the indicator function for the non-negative orthants defined as l + (Z ) = 0 for z ≥ 0 and l + (Z ) = +∞ otherwise.As the algorithm runs through the (k +1)-th iteration, the primal variables U k and F k as well as the dual variables µ k and λ k are updated, hence the need to derive the update formulas.Applying the Karush-Kuhn-Tucker (KKT) optimality conditions on the Augmented Lagrangian Functional (46) for the sequential minimization of the variables requires updating all the critical variables as indicated in the ADMM iterations.Moreover the update formulas of the variables U k and F k will be accelerated using the Gauss-seidel relaxation factor α ∈ [0, 2] as clearly illustrated in the works of Nesterov [9] and Ghadimi et.al [10].

Derivations of the update formulas
Applying the Karush-Kuhn-Tucker (KKT) optimality conditions on the Augmented Lagrangian Function in eqn.( 46) for the sequential minimization of the variables requires updating all the critical variables as indicated in the ADMM iterations.U-Update: Setting ∂ U L(U , .k ) = 0 yields; where, F-Update: Setting ∂ F L(U k+1 , F, .k ) = 0 and replacing CU k+1 , to relax the algorithm, with to yield where, where Updating the Scaled-Dual variables -ξ 1 , ξ 2 yields

Convergence Analysis
Considering an optimization problem analogous to the form in (45) above; and f 3 : R n → R ∪ {∞} are convex functions.The augmented lagrangian formulation is then stated thus: where µ ∈ R n and λ ∈ R m are Lagrange multipliers, ρ > 0 is the penalty parameter, . 2 denotes the euclidean (spectral) norm of a vector (matrix) argument, ξ 1 = µ/ρ ∈ R n and ξ 2 = λ/ρ ∈ R m are the scaled dual variables, z is the introduced slack vector and l + (z) is the indicator function for the nonnegative orthants.
From duality theory, it is well-known that if u * , y * , µ * and λ * are the primal and dual variables that minimize the Lagrangian function L ρ , then every pair (u k , y k ) in the convex set, . Let f k = f 1 (x k )+ f 2 (y k ) be the objective function value at the k-th iterate (x k , y k ) and f * = f 1 (x * ) + f 2 (y * ) be the optimal function value.Let r k+1 1 = h 1 (u k )+h 2 (y k ) and r k+1 2 = g 1 (u k )− z k be the residuals of the convex equality constraints in the kth iteration.The objective of this research is now to prove the following theorems.

Definition 3. (Dual Updates):
At optimality, the update equations for the dual variables from the augmented Lagrangian in (60) are given as Definition 4. (Lyapunov function): Let V k be a non-negative quantity for the algorithm such that V k − V k+1 decreases in each iteration by an amount that depends on the sum of the norms of the residuals and on the change in slack variable z over one iteration.Then the quantities are defined as;

Numerical Experiments
Considering a PDE constrained optimal control described heat transfer model written as follows: This section illiterates, via a number of computational experiments, the effects of different penalty parameter selection on the number of iterations required to reach the stopping criteria on the above problem as posed by Ghobadi et.al in [2] and Betts and Campbell in [1].The experiment involves selecting the termination (stopping) criteria r el = 10 −k and abs = 10 −k , k = 3, 4, 5, as relative and absolute tolerances respectively for the convergence of the algorithm.The algorithm was implemented with a MATLAB subroutine running on a Pentium V Computer with 4.0 GB RAM and 1.7 GHz CPU.The discretization step in time is largely smaller that of the spacial step to keep it feasible; though the implicit Crank-Nicholson numerical scheme used for the discretization of the PDE keeps it unconditionally stable.Below are the results of the experiment; Table 2 above shows the numbers of iter-          4 below shows the sparsity of the compact form of the model for varying and fixed step-sizes in time and space respectively; with the optimal results obtained at few iterations of the ADMM.It was observed that the level of sparsity increases for reducing step sized which consequently reduces the computational burden in the iterative process.shows the results of the problem for the varying values of the parameters and step size with the optimal results obtained at few iterations of the ADMM.The results obtained on MAT-LAB subroutine were compared with those of Ghobadi et al. in [2] for various time grid points (N ) ranging from N = 1000 to 8000 ran on MOSEK software.Table 6 below shows the optimal objective functional values generated at varying numbers N of time grid points (TGP) for fixed number of gridpoints in space kept at n = 10.The results in [2] when implemented on MOSEK is same as that obtained by the ADMM at a shorter Central Processing Unit (CPU) time though with higher iteration cycles due to high rate of convergence; using the tolerance of t ol .= 10 −4 .This clearly shows that the rate of convergence of the proposed algorithm is higher than that of MOSEK.

Conclusions
The extension of the ADMM to solving heat transfer problem as an example of PDE-based optimal control problems (OCP) with equality-Inequality constraints was well studied.The PDE-based OCP was transformed into a discretized convex quadratic optimization problem amenable to the ADMM.The implicit Crank Nicolson method and the 3rd order Simpsons rule were used for the discretization of the constraints and cost functional respectively with high feasibility, unconditionally stability and well-conditioned matrices.The ADMM algorithm converges faster with higher level of accuracy, low processing time and low memory requirements, when compared with that of Ghobadi et.al [2].Moreover, this approach can as well be applied to other PDE-based optimization problems like that of wave equations or problems involving integral (entropy) functions.

Figure 1 :Figure 2 :
Figure 1: The effects of tolerance on varying relaxation factors

Figure 3 :
Figure 3: The effects of relaxation factor on varying penalty parameters

Figure 4 :
Figure 4: The response of dual residual sum to iteration

Figure 5 :
Figure 5: The response of first dual variable (µ) to iteration

Figure 6 :
Figure 6: The response of second dual variable (λ) to iteration

Figure 7 :
Figure 7: The response of dual residuals (s k ) to iteration

Figure 8 :Figure 9 :
Figure 8: The response of primal variables to iteration Figure 9: The effects of iterations on the distance norm ||Z k − Z * || 2

Table 1 :
Sparsity Table for the Compactified Model 3.6.

Positive definiteness and Convexity of the Matrix oper- ator Definition
1. (Sylvester criterion): A square (n × n) matrix is positive definite if it is real, symmetric with all the eigenvalues (λ i > 0

Table 4 :
Sparsity from the Compactification of the problem Model with δ =

Table 6 :
Comparison of Computational results at optimum