Skip to content

Statistics Review

Bayes Theorem

\[ P(A\;|\;B) = \frac{P(A,B)}{P(B)} \]

Notice that if \(A\) and \(B\) are independent, then

\[ P(A\;|\;B) = \frac{P(A,B)}{P(B)} = \frac{P(A)P(B)}{P(B)} = P(A) \]

Joint Probability

It refers to the probability analysis the interaction between two or more random variables. Taking the example from the Lecture-2 slides.

d1 d2 d3 d4
A 10 10 10 10
B 10 10 10 0
C 10 10 0 0
D 0 0 0 1

Each cell in the table below represent the probability \(P(w,d)\).

d1 d2 d3 d4 P(w)
A 0.11 0.11 0.11 0.11 0.44
B 0.11 0.11 0.11 0.00 0.33
C 0.11 0.11 0.00 0.00 0.22
D 0.00 0.00 0.00 0.01 0.01
P(d) 0.33 0.33 0.22 0.12 1.00
\[ P(w) = P(w|a)P(a) + P(w|b)P(b) + P(w|c)P(c) + P(w|d)P(d) \]

Assuming they are independent.

\[ P(w) = P(w)(P(a)+P(b)+P(c)+P(d)) \]

Conditional probability example

Consider the distribution of the four binary random variables below.

Probability distribution diagram

From the image, we immediately derive the individual probabilities table

A B C D
=0 0.7 0.9 0.84 0.9
=1 0.3 0.1 0.16 0.1

With a little bit of work, we can also derive the conditional probability tables. Let's do it step by step for A and B.

A B \(P(A \mid B)\)
0 0 ?
0 1 ?
1 0 ?
1 1 ?

Notice that \(A=1\) whenever \(B=1\). That means that \(P(A=1 \mid B=1)=1\). That also means that it is impossible to have \(A=0\) whenever \(B=1\). The latter is translated as \(P(A=0 \mid B=1) = 0\).

A B \(P(A \mid B)\)
0 0 ?
0 1 0
1 0 ?
1 1 1

We can also compute \(P(A=0 \mid B=0)\). Notice that we know the value \(P(A=0,B=0)\) from the distribution diagram. That is, \(P(A=0,B=0)=0.7\). Using the latter plus the definition of conditional probability we obtain

\[ \begin{align} P(A=0 \mid B=0) &= \frac{P(A=0,B=0)}{P(B=0)} \\[1em] &= \frac{0.7}{0.9} \\[1em] &= \frac{7}{9} \end{align} \]

Updating our probability table

A B \(P(A \mid B)\)
0 0 7/9
0 1 0
1 0 ?
1 1 1

We can follow a similar path to compute \(P(A=1 \mid B=0)\). But we can also use the follow identity:

The summation of conditional probabilities over the events (left side of |) equals to \(1\).

That is, \(P(A=0 \mid B=0) + P(A=1 \mid B=0) = 1\). Therefore,

A B \(P(A \mid B)\)
0 0 7/9
0 1 0
1 0 2/9
1 1 1

We can obtain all the probability tables by repeating the previous steps for the other pairs of random variables

A C \(P(A \mid C)\)
0 0 11/14
0 1 1/4
1 0 3/14
1 1 3/4
A D \(P(A \mid D)\)
0 0 0.7
0 1 0.7
1 0 0.3
1 1 0.3
B A \(P(B \mid A)\)
0 0 1
0 1 2/3
1 0 0
1 1 1/3
B C \(P(B \mid C)\)
0 0 0.9
0 1 0.9
1 0 0.1
1 1 0.1
B D \(P(B \mid D)\)
0 0 0.9
0 1 0.9
1 0 0.1
1 1 0.1
C A \(P(C \mid A)\)
0 0 2/35
0 1 3/5
1 0 33/35
1 1 2/5

I am not going to list the remaining tables because the relations are very simple (the variable pairs are independent).

Conditional probability identities

  1. \(\sum_{i}{P(A=a_i \mid B=b)} = 1\)
  2. \(\sum_{i}{P(A=a \mid B=b_i)P(B=b_i)} = P(A=a)\)

Remarks

  1. We cannot define a distribution by giving arbitrarily values for conditional probabilities. In order to work, the conditional probabilities must respect the identities above.
  2. On the other hand, we can define a probability distribution by giving the probabilities of all and probabilities. That is, if my universe has random variables \(A,B,C\) and I have the values of \(P(A,B,C)\) for all the possible values of \(A,B,C\), then I have a probability distribution.