Follow Up |
Overview | Introduction | TD-Learning | Algorithms | Applet | Source Code | References |
Follow Up
In our project we placed emphasis on the modularization and reuse of our code. By maintaining this philosophy we created a RL-package that can be used by everyone easily - even with very little knowledge of RL. The following diagram shows how the different parts interact together.
The upper part corresponds to the training of our agent, showing its interactions with the world and the policy. The lower part corresponds to the user, a program or an applet, that is using the "knowledge" of the trained policy to perform its task. Both the RLearner and the user, interact with their own world, which holds the current state and defines the rules. It provides feedback about the next state, the validity of a certain action and the reward for a certain action. The policy links it all together. It is updated by the RLearner based on the experience it is having with the world and later on consulted by the user for choosing an optimal action. Both the training and the consulting of the policy can also be done simultaneously depending on the application. Usually training is done to some extent before using the policy.
Building the World
For using our classes we first might build our own world, which holds all the information about its state and the rules that apply to it. i.e. which state follows after applying an action to the world. This is straightforward and easy to do as we can see from some examples taken from [1]. Click to see our code for these two worlds.
CliffWorld.java - cliff world used on the algorithms page here.
GridWorld.java - the grid world, similar to the cliff world.
The world might hold its entire state internally but only allow certain state information to be passed to the Rlearner in order to simulate limitations the agent's sensors.
Learning a Policy
After designing a world, we are ready to use the RLearner:
Gridworld trainWorld = new GridWorld()
Rlearner rl = new Rlearner( trainWorld )
rl.setEpisodes( numEpisodes )
rl.runTrial()
rl.runEpoch()
Using the Policy
The user program will need to follow these steps to use the policy:
Gridworld playWorld = new GridWorld()
Rlpolicy myPolicy = rl.getPolicy()
The diagram below illustrates the interactions between a user program (in this case an applet), a trained policy and an example world:
We encourage our visitors to build their own worlds and use the RLearner to find a best policy for it. Small games with limited state space dimensions should be nearly as easy to build as the gridworlds from Sutton and Barto. It just has to agree with the RLWorld interface to be usable with the RLearner in a plug-and-play way. If you try it out let us know about it and submit the code to be published here.
Next...
Download or take a look at the source code for the applet.
Previous page | Next Page |