Bayesian Probability

A probability is our degree of confidence in a belief, that is, how much you anticipate some event to occur.

An event with Bayesian probability of .6 (or 60%) should be interpreted as stating “With confidence 60%, this event contains the true outcome”, whereas a frequentist interpretation would view it as stating “Over 100 trials, we should observe event X approximately 60 times.”

Our confidence in a belief can change to reflect new evidence using Bayes’ theorem, which can be derived as follows:

P(model, evidence) = P(model | evidence ) * P(evidence) = P(evidence | model) * P(model)
P(model | evidence) = P(model) * P(evidence | model) / P(evidence)

For example, if we have a test for a hypothesis that can have a negative or a positive outcome, we can use this rule to update our belief in this hypothesis depending on the test outcome.

Odds formulation and likelihood ratio

Bayes’ theorem can be written in terms of odds, which is the ratio of the probability of an event to its complement, Odds(A) = P(A) : P(¬A), so

P(A|E) : P(¬A|E) = P(A) * P(E|A) : P(¬A) * P(E|¬A) = Odds(A) * (P(E|A) / P(E|¬A))
# or
Odds(model | evidence) = PriorOdds(model) * LikelihoodRatio(evidence| model)

The likelihood ratio is the ratio of the probability of the evidence under the model to the probability of the evidence under the alternative hypothesis.

For example, suppose we want to know if a coin is fair or not. We have no prior belief that the coin is fair or not, so the the prior odds are 1:1. Let’s say we flip the coin 10 times and get 10 heads. And let’s also assume that, if the coin is unfair, the probability of heads is 60%. So we’ve got:

PriorOdds(coin is fair) = 1:1
LikelihoodRatio(10 heads | fair) = 0.5^10 : 0.6^10 ~ 1 : 6.2
Odds(fair | 10 heads) = 1:1 * 1:6.2 = 1:6.2

which means a probability of 1/7.2 = 14% that the coin is fair. You can check this result using the probability formulation of Bayes’ theorem too.

The likelihood ratio can be interpreted as the strength of the evidence - how much the evidence slides the prior odds. Interestingly, the likelihood ratio has a much stronger “falsification” effect than “confirmation” effect.

If we observe some positive evidence X for a theory A, the likelihood ratio will be at most ~1/P(X|¬A) - it is still inversely proportional to the predictive power of any alternative hypotheses. So the evidence strength depends on this particular evidence not being predicted by any other competing hypotheses.

On the other hand, if we observe some negative evidence Y for A, the likelihood ratio will be infinitesimal no matter what. Or, equivalently, negative evidence for A is positive evidence for ¬A, so the likelihood ratio for ¬A will be P(Y|¬A) divided by a number close to zero.

So negative evidence has a much stronger falsification effect than positive evidence has a confirmation effect - it annihilates priors.

Example

Suppose that the prior prevalence of breast cancer in a demographic is 1%. Suppose that we, as doctors, have a repertoire of three independent tests for breast cancer.

Our first test, test A, a mammography, has a likelihood ratio of 80%/9.6% = 8.33. Again, that means a true positive probability of 80% and a false positive probability of 9.6%.

The second test, test B, has a likelihood ratio of 18.0 (for example, from 90% versus 5%); and the third test, test C, has a likelihood ratio of 3.5 (which could be from 70% versus 20%, or from 35% versus 10%; it makes no difference). Suppose a patient gets a positive result on all three tests. What is the probability the patient has breast cancer?

For each test:

P(cancer | +) = P(+ | cancer) * P(cancer) / P(+)
P(cancer | +) = P(+ | cancer) * P(cancer) / ( P(+ | cancer) * P(cancer) + P(+ | ¬cancer) * P(¬cancer) )

If it was just test A:

P(cancer | +) = 80% * 1% / ( 80% * 1% + 9.6% * 99% ) = 7.8%

In terms of odds, this is:

80 : 9.6 = 25 : 3

The prior odds of having cancer are 1:99. So the odds of having cancer given a positive result are:

1 x 25 : 99 x 3 = 25 : 297

Or, equivalently, 7.8%. For all 3 tests combined:

P(cancer | all 3) = (80% * 90% * 70%) * 1% / ( (80% * 90% * 70%) * 1% + (9.6% * 5% * 20%) * 99%) = 84%

Note that we could have replaced 70%, 20% by 35%, 10% without changing the result. In terms of odds:

18 : 1 (test B)
3.5 : 1 = 7 : 2 (test C)
1 × 25 × 18 × 7 : 99 × 3 × 1 × 2 = 3150 : 594

Which gives us a probability of 3150 / (3150 + 594) = 84%.

Bayesianism vs Frequentism

The frequentist interpretation:

When we say the coin has a 50% probability of being heads after this flip, we mean that there’s a class of events similar to this coin flip, and across that class, coins come up heads about half the time. That is, the frequency of the coin coming up heads is 50% inside the event class, which might be “all other times this particular coin has been tossed” or “all times that a similar coin has been tossed”, and so on.

The bayesian interpretation:

Uncertainty is in the mind, not the environment. If I flip a coin and slap it against my wrist, it’s already landed either heads or tails. The fact that I don’t know whether it landed heads or tails is a fact about me, not a fact about the coin. The claim “I think this coin is heads with probability 50%” is an expression of my own ignorance, and 50% probability means that I’d bet at 1 : 1 odds (or better) that the coin came up heads.

The difference is more apparent when discussing ideas. A frequentist will not assign probability to an idea; either it is true or false and it cannot be true 6 times out of 10.

(source: LessWrong)