Bayesian Prediction

Background

Make sure you understand the Introduction to Bayesian statistics first. Now we'll derive "Bayes' Rule".

  • Venn diagram
  • P(A|B) = \frac{P(A \& B)}{P(B)}, and P(B|A) = \frac{P(A \& B)}{P(A)}, so P(A|B)P(B) = P(B|A)P(A) and therefor P(A|B) = \frac{P(A)P(B|A)}{P(B)}
  • Note that the above mathematics is true when A is a set of worlds or models and hence P(A) is a probability. It also holds when A is defined over a continuous space and so P(A) is a density rather than a probability.
  • If the set A is parameterised, then we can do this for a whole function at the same time. e.g. Assume A stood for "Location is x", and B stood for "the last observation was O" (see diagram p 199 of Probabilistic Robotics)
    • Then for each value of x, P(Location is x | last observation was O) = P(Location is x (regardless of the last observation)) P(last observation was O | Location was x) / P(last observation would have been O)
    • Note that P(last observation would have been O) is a constant - ignore it for the moment.
    • Note that P(Location is x | last observation was O), P(Location is x (regardless of the last observation)) and P(last observation was O | Location was x) are all functions of x - we could do the calculation for each value of x in parallel.
    • If f_1(x) = C_1e^{(x-\mu_1)^2} and f_2(x) = C_2e^{(x-\mu_2)^2} then f_1(x) \times f_2(x) = C_1e^{(x-\mu_1)^2} \times C_2e^{(x-\mu_2)^2} = C_1C_2e^{(x-\mu_1)^2+(x-\mu_2)^2}

Prediction

  • The goal of Bayesian prediction is to predict the next observation from a sequence of such observations.
  • We assume a class of models, M, which includes the true model, t.
    • Each model will define the probability of each observation at each timestep.
  • Define a probability distribution over those models; our subjective probability that each model is the true model.
    • Initially this could be a uniform distribution, or some sort of complexity prior (See Occam's razor) that makes more complex models less likely
  • At any point there are two obvious ways to make a prediction about the next observation:
    • Choose the most likely model and predict what it predicts, or (this is fast)
    • Use a weighted sum of the predictions of all the models. (this is more accurate)
    • Note that models might make different predictions at different times.
  • When we see an observation, we need to then update the probability distribution over the models given that distribution. Bayes' rule is used.
    • P(m|o_t) = \frac{P(m)P(o_t|m)}{P(o_t)}, but the observation probability is a constant, so P(m|o_t) \propto P(m)P(o_t|m), and you can renormalise to return that to a probability distribution.

Examples

  • Rocks of different types
  • Lights flashing at different speeds