Utility Theory I
Think animal training
- Can't explain the goal to the animal
- Can give 'good/bad' feedback.
Goal of the agent is to maximise rewards over time
- Is this enough? Could you 'teach' Asimov's three laws with this setup?
- What does it mean to 'maximise rewards over time'?
diverges- Limit the horizon to N steps:

- Discount the future:

- Average rewards:
or 
- gain v bias optimality
Given a history, (AOR)*, this allows us to get a single 'quality' number.
- This is an uncommon 'stateless' definition. We'll revisit this with state later (don't worry now if that doesn't mean anything).
Bayesian Agents I
Combining BayesianPrediction and Utility Theory.
- Start with a set of models of the environment
- Models are functions from histories to probability distributions over observations and rewards
At each time step
- do a forward search
- for observations consider each possible observation,
- On the way down each branch do the Bayesian update as if you'd just seen that observation
- On the way out, take a weighted average of the result by observation probability
- rewards are similar to observations
- for actions, consider each possible action. On the way out take the maximum.
We'll revisit this without the search later...