Learning to Control
Dynamic Physical Systems

The Pole and Cart

	


	


	


		
m_c
mass of cart
m_p
mass of pole
l
distance of centre of mass of pole from the pivot
g
acceleration due to gravity
F
force applied to cart
t
time interval of simulation

BOXES

The BOXES learning algorithm partitions the state space into regions according to how each dimension of the space is discretised. Each box represents a region of the problem space. In the pole and cart problem, there are four dimensions, one for each state variable (position and velocity of the cart, angle and angular velocity of the pole).

		


A box contains

These numbers decay after each trial. That is, before a new value is added to the old value, the old value is multiplied by a factor between 0 and 1 (usually around 0.99). Thus, old experiences have less effect on box settings.

The BOXES Algorithm

boxes
	loop
		randomly set starting position
		put trial into t
		if t > 10,000 then exit
		for each box, b
			if number of entries into b != 0
				check_box(b)

trial
	put 0 into t
	find the current box
	if pole has fallen then return t
	add one to t
	if t > 10,000 then return t
	add one to number of entries into current box
	add t to time sum of current box
	make move according to setting of box

check_box(b)
	multiply left life by decay
	multiply left usage by decay
	multiply right life by decay
	multiply left usage by decay

	if action setting is LEFT
		add no. of entries - time sum to left life
		add no. of entries to left usage
	if action setting is RIGHT
		add no. of entries - time sum to right life
		add no. of entries to right usage

	put 0 into the no. of entries
	put zero into time sum
	
	

	if LeftValue > RightValue
		set action to LEFT
	else if LeftValue < RightValue
		set action to RIGHT
	else
		make random choice

CRICOS Provider Code No. 00098G