Bayesian Prediction
Background
Make sure you understand the Introduction to Bayesian statistics first. Now we'll derive "Bayes' Rule".
- Venn diagram
, and
, so
and therefor 
- Note that the above mathematics is true when A is a set of worlds or models and hence P(A) is a probability. It also holds when A is defined over a continuous space and so P(A) is a density rather than a probability.
- If the set A is parameterised, then we can do this for a whole function at the same time. e.g. Assume A stood for "Location is x", and B stood for "the last observation was O" (see diagram p 199 of Probabilistic Robotics)
- Then for each value of x, P(Location is x | last observation was O) = P(Location is x (regardless of the last observation)) P(last observation was O | Location was x) / P(last observation would have been O)
- Note that P(last observation would have been O) is a constant - ignore it for the moment.
- Note that P(Location is x | last observation was O), P(Location is x (regardless of the last observation)) and P(last observation was O | Location was x) are all functions of x - we could do the calculation for each value of x in parallel.
- If
and
then 
Prediction
- The goal of Bayesian prediction is to predict the next observation from a sequence of such observations.
- We assume a class of models, M, which includes the true model, t.
- Each model will define the probability of each observation at each timestep.
- Define a probability distribution over those models; our subjective probability that each model is the true model.
- Initially this could be a uniform distribution, or some sort of complexity prior (See Occam's razor) that makes more complex models less likely
- At any point there are two obvious ways to make a prediction about the next observation:
- Choose the most likely model and predict what it predicts, or (this is fast)
- Use a weighted sum of the predictions of all the models. (this is more accurate)
- Note that models might make different predictions at different times.
- When we see an observation, we need to then update the probability distribution over the models given that distribution. Bayes' rule is used.
, but the observation probability is a constant, so
, and you can renormalise to return that to a probability distribution.
Examples
- Rocks of different types
- Lights flashing at different speeds