Statistics Review

Bayes Theorem

\[ P(A\;|\;B) = \frac{P(A,B)}{P(B)} \]

Notice that if \(A\) and \(B\) are independent, then

\[ P(A\;|\;B) = \frac{P(A,B)}{P(B)} = \frac{P(A)P(B)}{P(B)} = P(A) \]

Joint Probability

It refers to the probability analysis the interaction between two or more random variables. Taking the example from the Lecture-2 slides.

	d1	d2	d3	d4
A	10	10	10	10
B	10	10	10	0
C	10	10	0	0
D	0	0	0	1

Each cell in the table below represent the probability \(P(w,d)\).

	d1	d2	d3	d4	P(w)
A	0.11	0.11	0.11	0.11	0.44
B	0.11	0.11	0.11	0.00	0.33
C	0.11	0.11	0.00	0.00	0.22
D	0.00	0.00	0.00	0.01	0.01
P(d)	0.33	0.33	0.22	0.12	1.00

\[ P(w) = P(w|a)P(a) + P(w|b)P(b) + P(w|c)P(c) + P(w|d)P(d) \]

Assuming they are independent.

\[ P(w) = P(w)(P(a)+P(b)+P(c)+P(d)) \]

Conditional probability example

Consider the distribution of the four binary random variables below.

Probability distribution diagram

From the image, we immediately derive the individual probabilities table

	A	B	C	D
=0	0.7	0.9	0.84	0.9
=1	0.3	0.1	0.16	0.1

With a little bit of work, we can also derive the conditional probability tables. Let's do it step by step for A and B.

A	B	\(P(A \mid B)\)
0	0	?
0	1	?
1	0	?
1	1	?

Notice that \(A=1\) whenever \(B=1\). That means that \(P(A=1 \mid B=1)=1\). That also means that it is impossible to have \(A=0\) whenever \(B=1\). The latter is translated as \(P(A=0 \mid B=1) = 0\).

A	B	\(P(A \mid B)\)
0	0	?
0	1	0
1	0	?
1	1	1

We can also compute \(P(A=0 \mid B=0)\). Notice that we know the value \(P(A=0,B=0)\) from the distribution diagram. That is, \(P(A=0,B=0)=0.7\). Using the latter plus the definition of conditional probability we obtain

\[ \begin{align} P(A=0 \mid B=0) &= \frac{P(A=0,B=0)}{P(B=0)} \\[1em] &= \frac{0.7}{0.9} \\[1em] &= \frac{7}{9} \end{align} \]

Updating our probability table

A	B	\(P(A \mid B)\)
0	0	7/9
0	1	0
1	0	?
1	1	1

We can follow a similar path to compute \(P(A=1 \mid B=0)\). But we can also use the follow identity:

The summation of conditional probabilities over the events (left side of |) equals to \(1\).

That is, \(P(A=0 \mid B=0) + P(A=1 \mid B=0) = 1\). Therefore,

A	B	\(P(A \mid B)\)
0	0	7/9
0	1	0
1	0	2/9
1	1	1

We can obtain all the probability tables by repeating the previous steps for the other pairs of random variables

A	C	\(P(A \mid C)\)
0	0	11/14
0	1	1/4
1	0	3/14
1	1	3/4

A	D	\(P(A \mid D)\)
0	0	0.7
0	1	0.7
1	0	0.3
1	1	0.3

B	A	\(P(B \mid A)\)
0	0	1
0	1	2/3
1	0	0
1	1	1/3

B	C	\(P(B \mid C)\)
0	0	0.9
0	1	0.9
1	0	0.1
1	1	0.1

B	D	\(P(B \mid D)\)
0	0	0.9
0	1	0.9
1	0	0.1
1	1	0.1

C	A	\(P(C \mid A)\)
0	0	2/35
0	1	3/5
1	0	33/35
1	1	2/5

I am not going to list the remaining tables because the relations are very simple (the variable pairs are independent).

Conditional probability identities

\(\sum_{i}{P(A=a_i \mid B=b)} = 1\)
\(\sum_{i}{P(A=a \mid B=b_i)P(B=b_i)} = P(A=a)\)

Remarks

We cannot define a distribution by giving arbitrarily values for conditional probabilities. In order to work, the conditional probabilities must respect the identities above.
On the other hand, we can define a probability distribution by giving the probabilities of all and probabilities. That is, if my universe has random variables \(A,B,C\) and I have the values of \(P(A,B,C)\) for all the possible values of \(A,B,C\), then I have a probability distribution.