Review of the course. Student driven! There is a [http://www.cse.unsw.edu.au/~cs3431/3431-sample-exam.pdf sample exam] available.
Or this:
Function approximation in MDPs
- Function approximation usually covered in Machine Learning
- Take a machine learning course!
- Definition of function
- A function has a domain and a range
- It maps each element in the domain to an element of the range
- e.g.
- Value function maps states onto reals
- Policy maps states onto actions
- Transition function maps state,action pairs onto probability distributions over next states
- Can also think of this as a map from state,action,next-state triples onto reals
- Implementations
- Table of values, one value for each element of the domain
- Really only works when the domain is discrete
- Can be HUGE
- Divide the domain into a set of non-overlapping regions
- store a single value for each region
- e.g. Ignore one domain variable -- just use a table over the remaining variables
- Algebraic closed form / parametric form
- e.g. f(x) = m*x + b -- the entire function can be stored, assuming a fixed form, by recording m and b
- e.g. Neural Net with fixed topology -- the weights specify which of the fuctions with that topology the net represents
- Variable sized representations / "non-parametric form"
- Sum of N Gaussians
- Store N, and then the mean and co-variance matrix for each Gaussian
- Tree
- Each leaf of the tree is constant -- piecewise constant
- Each leaf of the tree is linear -- piecewise linear
- How do you store the linear functions?
- Triangulation
- Kuhn triangulation
Triangles all have a horizontal and vertical edge, eith the diagonal always in the same general direction (not necessarily at 45 degrees though).
Triangle corners are added on the midpoint of existing horizontal or vertical edges.
Fast to calculate (just a tree structure), but can be sub-optimal - long very unequilateral triangles can be formed with discontinuities at the edges where other triangles join up. - Delauney triangulation
The awesome one where triangles can go anywhere, but it tries to make things as equilateral as possible (or to be definite about it, it ensures that no vertex lies within the interior of any of the circumcircles of the triangles in the network... for pretty picture: http://www.ems-i.com/gmshelp/Modules/TIN_Module/Creating_TINs/Triangulation.htm)
Implementations usually have a second tree for finding a triangle vertex given a point, since there is no nice structure for searching.
- A hierarchical decomposition can be viewed in this form
- Table of values, one value for each element of the domain
- Choosing an implementation
- Differentiable?
- Most fixed sized representations are
- Most size changing operations are not
- Bias
- Some representations can represent any valid function
- e.g. A table of values can represent any function with a finite discrete domain
- Most representations cannot represent some functions
- A piecewise constant function (e.g. tree with constants in the leaves) cannot represent a smoothly varying function
- Some representations make it harder to represent some functions than others
- e.g. A tree with axis-parallel splits finds it easier to represent functions that divide into regions along axis parallel splits
- Bias is important
- It tells you what to do when you don't have enough information
- Differentiable?
- Approximation in reinforcement learning
- Use a parameterized representation of the policy
- Use policy gradient descent
- e.g. Walk learning
- Guaranteed to converge to a local optimum
- Can get stuck in local minima
- Use a function approximator for the Q function, and use Q-learning
- Often does not converge!
- Extrapolation leads to changes increasing in size
- Offsets the dampening effect of the discount factor
- Solutions
- Only use contraction function approximators
- linear interpolation
- state abstraction
- Gradient descent on the Bellman residual
- Solutions are guaranteed to converge, but it is not always clear what they converge to
- Combination policy gradient/value function techniques
- Value function + boltzman exploration can be viewed as a parameterized policy
- Can use both direct gradient descent, AND value approaches to optimize
- Variable resolution techniques
- Use one of the above techniques on a fixed size approximation
- Detect when the approximation is going bad and increase the resolution in that area
- When to increase resolution? - see Munos and Moore 2001
- Really want a good policy - value function is just there to choose the policy
- Increase resolution in those places where we want more resolution in the policy
- Split nodes in tree when policy is different on different sides of the leaf
- We also want to make sure values are accurate in regions where it affects policy
- Build an 'influence' function that shows which states' values affect the policy at a given state
- Build a 'variance' function estimating the error in a given state (see a machine learning course)
- split when the product of influence and variance is high
- Use a parameterized representation of the policy
- Action spaces
- Consider the Q function: state x action -> value
- Need to choose the maximum action for a given state
- Can be expensive with the wrong representation
- The bias of the representation will limit where the max can fall
- Variable resolution techniques in action space may help here
- Ongoing research - most people just discretize the action space