Learning to ControlDynamic Physical Systems

• Conventional control theory requires a mathematical model to predict the behaviour of a process so that appropriate control decisions can be made.
• Many processes are too complicated to model accurately.
• Often, not enough information is available about the process' environment.
• When the system is too complicated or the environment is not well understood, an adaptive controller may work.
• An adaptive controller learns how to use the control actions available to meet the system's objective.
• The process is treated as a 'black box' and the program interacts with it by conditioned response.

The Pole and Cart

```

```
m_c
mass of cart
m_p
mass of pole
l
distance of centre of mass of pole from the pivot
g
acceleration due to gravity
F
force applied to cart
t
time interval of simulation

BOXES

The BOXES learning algorithm partitions the state space into regions according to how each dimension of the space is discretised. Each box represents a region of the problem space. In the pole and cart problem, there are four dimensions, one for each state variable (position and velocity of the cart, angle and angular velocity of the pole).
```

```
A box contains

• an action setting (left or right)
• the weighted sum of lifetimes after a left decision (left life)
• the weighted number of times a left decision has been made (left usage)
• the weighted sum of lifetimes after a right decision (right life)
• the weighted number of times a right decision has been made (right usage)
These numbers decay after each trial. That is, before a new value is added to the old value, the old value is multiplied by a factor between 0 and 1 (usually around 0.99). Thus, old experiences have less effect on box settings.

The BOXES Algorithm

```boxes
loop
randomly set starting position
put trial into t
if t > 10,000 then exit
for each box, b
if number of entries into b != 0
check_box(b)

trial
put 0 into t
find the current box
if pole has fallen then return t
if t > 10,000 then return t
add one to number of entries into current box
add t to time sum of current box
make move according to setting of box

check_box(b)
multiply left life by decay
multiply left usage by decay
multiply right life by decay
multiply left usage by decay

if action setting is LEFT
add no. of entries - time sum to left life
add no. of entries to left usage
if action setting is RIGHT
add no. of entries - time sum to right life
add no. of entries to right usage

put 0 into the no. of entries
put zero into time sum
```
```

if LeftValue > RightValue
set action to LEFT
else if LeftValue < RightValue
set action to RIGHT
else
make random choice
```

CRICOS Provider Code No. 00098G