Representing Probability Distributions
Note: this stuff can be viewed two ways
- Ways to combine simple distributions into complex distributions
- Ways to decompose complex distributions into simple distributions
Remember: all probability distributions are just functions from sets to real numbers.
- Almost all probability distributions are functions from individual elements of the sets to real numbers.
Three broad classes
Tabular forms
In general, one way to represent a probability distribution is as a big table of values. For any element of the appropriate universal set, we have an entry in our table/array. This only works for discrete domains (for continuous domains we'd have an infinite table). It is also inefficient for large domains.
Closed forms
If you can show that your probability distribution must have a particular form, then you can represent it using just the parameters for that form. We'll see later that sometimes we can show this (see Conjugate prior). In general this type of representation is much more efficient than the tabular form, and so is used as an approximation, even if not technically accurate.
- The uniform distribution.
- Normal or Gaussian Distribution Distribution:
- Approximately right when you're combining many different sources of noise
- Has two parameters: the mean,
, and the standard deviation,
. (It is common to use the variance,
, instead of the standard deviation.) 
- Poisson distribution:
- Used when you've got items with a given average density, and you want to know how likely it is that a particular number of those items are in a given space.
- Has one parameter,
. 
- Dirichlet distribution (and the related Beta distribution)
- Used when you want a distribution over distributions. e.g. you have a dice of unknown weight.
Random Samples
You can represent a probability distribution using a set of samples from that distribution. This representation is necessarily approximate. The more dense the samples, the higher the probability in that region of the space. (See Monte Carlo method, and Dieter Fox's animations.)
- Performs poorly with small numbers of samples
- Can represent any distribution
- Works best with peaked distributions (and less well when distributions are fairly flat/uniform)
Combinations
Independence
If there is no relationship between two variables, then the probability distribution will factor.
Consider the heights of people in the room. We'd expect a Gaussian distribution. Now consider the heights of pairs of people in this room. We now have two parameters in our probability distribution, the height of person A and the height of person B - a 2D distribution. But there is no relationship here - we'd expect that P(x, y) = P(X = x & Y = y) = P(x).P(y).
In contrast, consider the heights of married couples. In that case we might expect that people marry others of the same height.
- Building the Multivariate normal distribution
- Two parameters, the mean vector
and the covariance matrix 

- If the variables are independent
- If all the variances are equal then you get a circular/spherical gaussian
- The covariance matrix is cI
- If the variances are not equal then you get an axis-parallel elliptical gaussian
- The covariance matrix is diagonal with the entries being the individual variances
- If the variables are correlated (i.e. not independent)
- Not true for general probability distributions, but for the multi-variate normal, this is equivalent to an affine transform of a spherical gaussian.
- Two parameters, the mean vector
Conditional Independence
(See Conditional independence)
Two events (models, sets of models), A and B, are conditionally independent given C, if they are independent when conditioned on C. P(A, B | C) = P(A & B | C) = P(A | C).P(B | C). This means that P(A, B, C) = P(C).P(A, B | C) = P(C).P(A | C).P(B | C).
One common example of conditional independence is 'common cause correlation'. e.g. ice-cream sales and drowning deaths are correlated, but they are conditionally independent given the weather.
Like full independence, conditional independence allows a probability distribution to be factored. The standard representation is a Graphical model. (See also http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html.)
