Reinforcement Learning

Follow Up

Overview Introduction TD-Learning Algorithms Applet Source Code References

  Follow Up

In our project we placed emphasis on the modularization and reuse of our code. By maintaining this philosophy we created a RL-package that can be used by everyone easily - even with very little knowledge of RL. The following diagram shows how the different parts interact together.

The upper part corresponds to the training of our agent, showing its interactions with the world and the policy. The lower part corresponds to the user, a program or an applet, that is using the "knowledge" of the trained policy to perform its task. Both the RLearner and the user, interact with their own world, which holds the current state and defines the rules. It provides feedback about the next state, the validity of a certain action and the reward for a certain action. The policy links it all together. It is updated by the RLearner based on the experience it is having with the world and later on consulted by the user for choosing an optimal action. Both the training and the consulting of the policy can also be done simultaneously depending on the application. Usually training is done to some extent before using the policy.

  Building the World

For using our classes we first might build our own world, which holds all the information about its state and the rules that apply to it. i.e. which state follows after applying an action to the world. This is straightforward and easy to do as we can see from some examples taken from [1]. Click to see our code for these two worlds. - cliff world used on the algorithms page here. - the grid world, similar to the cliff world.

The world might hold its entire state internally but only allow certain state information to be passed to the Rlearner in order to simulate limitations the agent's sensors.

  Learning a Policy

After designing a world, we are ready to use the RLearner:

  1. Create new instance of the world.
         Gridworld trainWorld = new GridWorld()
  2. Create new instance of the Rlearner passing it the world in the constructor.
         Rlearner rl = new Rlearner( trainWorld )
    The Rlearner then creates a new policy based on the dimensions of the world.
  3. Set the parameters for the Rlearner and start learning
         rl.setEpisodes( numEpisodes )
  4.      rl.runTrial()
  5. You can optionally run each epoch individually

  Using the Policy

The user program will need to follow these steps to use the policy:

  1. Create a new instance of the same world for itself.
         Gridworld playWorld = new GridWorld()
  2. Get the policy from the Rlearner.
         Rlpolicy myPolicy = rl.getPolicy()
  3. Use the policy to determine the next "best" action to be applied to the world.

The diagram below illustrates the interactions between a user program (in this case an applet), a trained policy and an example world:

We encourage our visitors to build their own worlds and use the RLearner to find a best policy for it. Small games with limited state space dimensions should be nearly as easy to build as the gridworlds from Sutton and Barto. It just has to agree with the RLWorld interface to be usable with the RLearner in a plug-and-play way. If you try it out let us know about it and submit the code to be published here.


Download or take a look at the source code for the applet.

Previous page Next Page