Non-gradient methods:

  • Powell's Direction Set Method

    • Pick a start point and N directions
    • Minimise in each direction in turn
      • Example - horizontal valley
      • Example - 45 degree valley
    • Conjugate directions: (p 414)

      • Imagine we have all gradient information we might want.
      • Do first line minimization - dir u
      • Gradient is now T to line
      • Add constraint - gradient must stay T to u
      • Can be approximated using the Hessian matrix - the 2nd derivative matrix
        • Will be perfect if we're in a parabola
      • Gives a set of lines to minimise along - choose the one with highest gradient
    • Basic algorithm

      • Save starting position as P0
      • For i = 1...N
      • Move P(i-1) to along direction Ui, and make new min Pi For i = 1...(N-1)
        • Ui <- Ui+1
        • Un <- Pn - P0
        • Minimise along Pn
        • Repeat (result of prev min = new P0)
      • k iterations make last k members mutually conjugate (for a parabolic function)
      • Problem
        • Can lose track of a dimension
        • Become linearly dependant
        • Reset the direction set ever N iterations
        • Reset to original directions
        • Just make sure we're not linearly dependant
        • Modify which direction we replace to make a heuristically good function
        • Remove the direction of largest decrease.
  • Use of Gradient:

    • Simple gradient descent
    • Gradient descent with momentum
    • Use gradient for line search
    • Conguate gradient methods
      • A set of vectors are conjugate if minimizing along any one of them keeps gradient T to others
      • Quasi-Newton or Variable Metric methods (BFGS) Build up an estimate of the Hessian Use that to find dir to min of quadratic function Search in that direction Note: because we don't have the hessian directly It takes N minimizations to build it up from the gradient This is not very different to the conjugate gradient methods, except it requires storage for the whole hessian matrix, not just the last conjugate direction

Estimating the Gradient Take a step in each direction - method of differences

    def approx_fprime(xk,f,*args):
        f0 = apply(f,(xk,)+args)
        grad = Num.zeros((len(xk),),'d')
        ei = Num.zeros((len(xk),),'d')
        for k in range(len(xk)):
            ei[k] = 1.0
            grad[k] = (apply(f,(xk+epsilon*ei,)+args) - f0)/epsilon
            ei[k] = 0.0
        return grad

Can do the same thing with randomized vectors Don't get stuck becuase some partials are not good