Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The NLP Procedure

Criteria for Optimality

PROC NLP solves

& \min f(x) , & x \in {\cal R}^n , \ {s.t.} & c_i(x) = 0 , & i = 1, ... ,m_e , \ & c_i(x) \ge 0 , & i = m_e+1, ... ,m .

where f is the objective function and the m ci's are the constraint functions.

A point x is feasible if it satisfies all the constraints. The feasible region G is the set of all the feasible points. x* is a global solution of the above problem if no point in G has a lower function value than f(x*). x* is a local solution of the above problem if there exists some open neighborhood surrounding x* in which no point has a lower function value than f(x*). By definition, every global minimum is also a local minimum. Nonlinear Programming cannot consistently find global minima. Therefore, all the algorithms in PROC NLP find a local minimum for this problem. If you need to find the global solution of your problem, you may have to run PROC NLP with different starting points obtained either at random or by selecting a point on a grid which contains G.

The local minimizer x* of this problem satisfies the following local optimality conditions:

  1. There exists a small (feasible) neighborhood of x* that does not contain any point x with
  2. The vector of first derivatives (gradient) g(x^*) = \nabla f(x^*) of the objective function f (projected toward the feasible region) at the point x* is zero.
  3. The matrix of second derivatives G(x^*) = \nabla^2 f(x^*) (Hessian matrix) of the objective function f (projected toward the feasible region G) at the point x*

Most of the optimization algorithms in PROC NLP use iterative techniques which result in a sequence of points x0,...,xn,..., which should converge to a local optimum x*. One way to find out whether the objective function has more than one local optimum is to run various optimizations with different starting points x0.

An iterative optimization algorithm terminates at the point xt, which should be in a close neighborhood (in terms of a user-specified termination criterion) of a local optimizer x*. If the point xt is located on one or more active boundaries or general linear constraints, the local optimization conditions are valid only for the feasible region; that means

If there are n active constraints at the point xt, the nullspace Z has zero columns and the projected Hessian has zero rows and columns. A matrix with zero rows and columns is considered positive as well as negative definite.

Karush-Kuhn-Tucker Conditions

This can be made explicit mathematically by considering the linear combination of objective and constraint functions

L(x,\lambda) = f(x) - \sum_{i=1}^m \lambda_i c_i(x) .
This is called the Lagrange function and the coefficients \lambda_i are called Lagrange multipliers.

Assuming the functions f and ci are twice continuously differentiable, the point x* is an isolated local minimizer of the nonlinear programming problem, if there exists a vector \lambda^*=(\lambda_1^*, ... ,\lambda_m^*) that meets the following conditions:

1. Karush-Kuhn-Tucker conditions:

c_i(x^*) = 0 , & &
 & i = 1,  ...  ,m_e , \ c_i(x^*) \ge 0 , & \lambda_i^* \ge 0...
 ... c_i(x^*) = 0 ,
 & i = m_e+1,  ...  ,m , \ \nabla_x L(x^*,\lambda^*) = 0 & & &

2. Second-order condition:

Each nonzero vector y \in {\cal R}^n with

y^T \nabla_x c_i(x^*) = 0 
 \{ i = 1, ... ,m_e , \ \forall i\in \{ m_e+1, ... ,m: \lambda_i^* \gt 0 \},
 .
satisifies
y^T \nabla_x^2 L(x^*,\lambda^*) y \gt 0 .

The presence of a negative (positive) Lagrange multiplier in minimization (maximization) indicates that a possible reduction (increase) of the objective function can be obtained by releasing the corresponding active linear constraint.

Derivatives

The 1st and 2nd order conditions on the Lagrange function require derivates on the object function f and on the constrints ci.

The gradient vector contains the first derivatives of the objective function f with respect to the parameters x1, ... ,xn, as follows:

g(x) = \nabla f(x) = (\frac{\partial f}{\partial x_j}) .

The n ×n Hessian matrix contains the second derivatives of the objective function f with respect to the parameters x1, ... ,xn, as follows:

G(x) = \nabla^2 f(x) = (\frac{\partial^2 f}{\partial x_j \partial x_k}) .

For Least-Squares problems the m ×n Jacobian matrix contains the first-order derivatives of m objective functions fi(x) with respect to the parameters x1, ... ,xn, as follows:

J(x) = (\nabla f_1, ... ,\nabla f_m) =
 (\frac{\partial f_i}{\partial x_j}) .
In case of least-squares problems the cross-product Jacobian JTJ,
J^TJ = ({\sum_{i=1}^m \frac{\partial f_i}{\partial x_j}
 \frac{\partial f_i}{\partial x_k}})
is used as an approximate Hessian matrix. Using the m vector f = f(x) of function values f(x) = (f1(x), ... ,fm(x))T, the gradient g=g(x) is computed by
g(x) = JT(x) f(x) .

The mc ×n Jacobian matrix contains the first-order derivatives of mc nonlinear constraint functions ci(x), i = 1, ... ,mc, with respect to the parameters x1, ... ,xn, as follows:

CJ(x) = (\nabla c_1, ... ,\nabla c_{mc}) =
 (\frac{\partial c_i}{\partial x_j}) .

PROC NLP provides three ways to compute derivatives:

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.