- Conventional control theory requires a mathematical model to predict the
behaviour of a process so that appropriate control decisions can be made.
- Many processes are too complicated to model accurately.
- Often, not enough information is available about the process' environment.
- When the system is too complicated or the environment is not well
understood, an adaptive controller may work.
- An adaptive controller learns how to use the control actions available to
meet the system's objective.
- The process is treated as a 'black box' and the program interacts with it
by conditioned response.
-
- m_c
- mass of cart
- m_p
- mass of pole
- l
- distance of centre of mass of pole from the pivot
- g
- acceleration due to gravity
- F
- force applied to cart
- t
- time interval of simulation
The BOXES learning algorithm partitions the state space into regions according
to how each dimension of the space is discretised. Each box represents a region
of the problem space. In the pole and cart problem, there are four dimensions,
one for each state variable (position and velocity of the cart, angle and
angular velocity of the pole).
A
box contains
- an action setting (left or right)
- the weighted sum of lifetimes after a left decision (left life)
- the weighted number of times a left decision has been made (left usage)
- the weighted sum of lifetimes after a right decision (right life)
- the weighted number of times a right decision has been made (right
usage)
These numbers decay after each trial. That is, before a new value is
added to the old value, the old value is multiplied by a factor between 0 and 1
(usually around 0.99). Thus, old experiences have less effect on box settings.
boxes
loop
randomly set starting position
put trial into t
if t > 10,000 then exit
for each box, b
if number of entries into b != 0
check_box(b)
trial
put 0 into t
find the current box
if pole has fallen then return t
add one to t
if t > 10,000 then return t
add one to number of entries into current box
add t to time sum of current box
make move according to setting of box
check_box(b)
multiply left life by decay
multiply left usage by decay
multiply right life by decay
multiply left usage by decay
if action setting is LEFT
add no. of entries - time sum to left life
add no. of entries to left usage
if action setting is RIGHT
add no. of entries - time sum to right life
add no. of entries to right usage
put 0 into the no. of entries
put zero into time sum
if LeftValue > RightValue
set action to LEFT
else if LeftValue < RightValue
set action to RIGHT
else
make random choice
CRICOS Provider Code No. 00098G