Since there is not difference between (A,B) and (B,A), therefore, P(A,B) = P(B,A).
From definition 2, it follows that P(A,B) = P(A|B)P(B) and P(B,A) = P(B|A)P(A).
Therefore, P(A|B)P(B) = P(B|A)P(A), hence yields the famous Bayes' Rule which states:
This Bayes' Rule is first formulated by Reverend Thomas Bayes in his "An Essay Toward Solving a Problem in the Doctrine of Chances" in 1763.
To understand this rule more clearly we should look at this (purely fabricated) example:
Only 1 in 1000 adult is afflicted with a rare disease for which a diagnostic test has been developed. The test is such that, when an individual actually has the disease, a positive result will diagnostic 99% of the time, while an individual without the disease will show a positive test result only 2% of the time. What is the probability of a individual has positive result? What is the probability an individual with positive result has the disease?
Sound complex isn't it? But we can solve it using Bayes' Rule:
Hence there is about 5% chances an individual with positive result has the disease.
"So what?" I heard someone say, "So it can shift some probabilities around. What is this got to do with machine learning?"
Well, it is useful because it allow us to shift the conditional probability around, and sometimes we require to do when data gathering is simpler in one directions but not other. For example, it is relative easy to calculate the probabilities of a person getting a positive result while he/she has the disease (just use the same test against patients that are known to be infected). However to measure probabilities of a person having a disease while having positive result might be difficult, as we need to provide another test to confirm if that person is infected or not. The Bayes Rule provide a simple means to calculate the reverse conditions. This process of calculate the probability for some random variable based on certain defined value (or evidence) is called probability inference, which follows exactly what every artificial agent trying to do: to learn and response to difference situation based on given evidences.
It is this probability inference that form the basis for our first machine learning technique, the Naive Bayes Classifier