Probability Review (1)

Probability Quick Review & Cheatsheet (1): Probability Axiom, Set Calculation, Counting Methods, Conditional Probability, Random Variable and its Characteristics.

Only the most fundamental probability theories, no measure theories.

1. Preliminary

Basic Units:
- Experiment: Any real or hypothetical process, in which the possible outcome can be identified ahead of time.
- Sample Space: set \(S\), all possible outcomes of an experiment.
- Event: A well-defined set of possible outcomes of an experiment, i.e. any subset of \(S\), usually denoted by capital letters such as \(A, B, E, F\)
Set Theory:
- Union / Intersection / Complementation.
- Mutually exclusive (disjoint) sets: intersection of two sets are empty set, \(AB = \varnothing\)
- Basic Laws:
  - Communicative laws: \(E\cup F = F\cup E\), \(EF=F E\).
  - Associative laws: \((E\cup F)\cup G = E\cup (F\cup G)\), \((E F)G = E(F G)\).
  - Distributive laws: \((E\cup F) G = (E G)\cup(FG)\), \((E F)\cup G = (E\cup G)(F\cup G)\).
  - De Morgan’s laws: \((E\cup F)^{c}=E^{c}F^{c}\), \((EF)^{c} = E^{c}\cup F^{c}\).
Probability Definition
- Classical: \(n\) equally likely outcomes of which \(m\) outcomes are wanted, the probability of wanted events happen is \(m/n\).
- Relative Frequency: repeat trials a large number of times with same condition, the probability of certain event is approximately the ratio of the event count and total trial count.
- Axiom: consider an experiment where sample space is \(S\), for each event \(E\) in the sample space, the probability of event \(P(E)\) is defined and satisfies:
  
  (1) \(0 \leq P(E) \leq 1\).
  
  (2) \(P(S) = 1\).
  
  (3) for mutually exclusive events \(E_{1}, E_{2}, \cdots, \),
  
  \[
  P\left(\bigcup^{\infty}_{i=1}E_{i}\right) = \sum^{\infty}_{i=1}P(E_{i}).
  \]
Propositions of Probability
- For empty set / event, \(P(\varnothing) = 0\).
- For finite sequence of mutually exclusive events \(E_{1}, E_{2}, \cdots, E_{n}\),
  
  \[
  P\left(\bigcup^{n}_{i=1}E_{i}\right) = \sum^{n}_{i=1}P(E_{i}).
  \]
- If \(E\subset F\), then \(P(E)\leq P(F)\).
- \(P(E\cup F) = P(E) + P(F) - P(EF)\).
Sum of Geometric Series

Suppose there is a geometric series \(a, ar, ar^{2}, \cdots, ar^{n-1}\), where \(r\neq 1\), the sum of first \(n\) terms is

\[
a + ar + ar^{2} + \cdots + ar^{n-1} = \sum^{n-1}_{k=0}ar^{k} = a\left(\frac{1-r^{n}}{1-r}\right).
\]

2. Counting Methods

Counting methods are useful to calculate probabilities of equally likely events, like coin, dice, poker, some games.

Sum Rule

Tasks cannot be done at same time, i.e. sequential tasks, can be applied with sum rule.

Product Rule

Successive tasks with different options at different stages.

Permutations

Ordered arrangement of object in a set. The number of different permutations of \(n\) distinct object is \(n\) factorial, \(n!\). If there are\(m\) indistinguishable objects, they need to be compensated by dividing \(m!\).
The circular permutation of n distinct objects is \((n-1)!\).

Combinations

Unordered arrangement of \(r\) objects from a set of \(n\) distinct objects, denoted by
\[
{n \choose r } = \frac{n!}{(n-r)!}.
\]

Binomial Theorem

\[
(x + y)^{n} = \sum^{n}_{k=0}{n \choose k}x^{k}y^{n-k}.
\]

Two applications:
(1) arrange \(a\) similar red balls and \(b\) similar black balls, number of distinguishable ways to order in a row is
\[
{a + b \choose a} = {a + b \choose b}.
\]
(2) arrange \(n\) indistinguishable objects into \(k\) distinguishable sets (\(n\) balls and \(k-1\) separators) is
\[
{n+k-1 \choose n} = {n+k-1 \choose k-1}.
\]

Two famous problems:

(1) Birthday Problem
(2) Matching Problem

3. Conditional Probabilities

Sample space is reduced for conditional probabilities by given additional information. The conditional probability of \(E\) given \(F\) is denoted by \(P(E|F)\), if \(P(F) > 0\), then

\[
P(E|F) = \frac{P(EF)}{P(F)}.
\]

Three rules for conditional probabilities:

(1) Multiplication Rule (Chain Rule):

\[
P(E_{1}E_{2}\cdots E_{n}) = P(E_{n}|E_{1}E_{2}\cdots E_{n-1})\cdots P(E_{2}|E_{1})P(E_{1}).
\]

(2) Law of Total Probability

If \(F_{1}, F_{2}, \cdots, F_{n}\) are mutually exclusive events such that \(\cup^{n}_{i=1}F_{i} = S\), then
\[
P(E) = \sum^{n}_{i=1}P(EF_{i}) = \sum^{n}_{i=1}P(E|F_{i})P(F_{i}).
\]

(3) Bayes Formula

If \(F_{1}, F_{2}, \cdots, F_{n}\) are mutually exclusive events such that \(\cup^{n}_{i=1}F_{i} = S\), then

\[
P(F_{i}|E) = \frac{P(E|F_{i})P(F_{i})}{\sum^{n}_{j=1} P(E|F_{j}) P(F_{j})}.
\]

Independent Events

Two events \(E\) and \(F\) are statistically independent if \(P(EF) = P(E)P(F)\).

Two theorems of independent events:

(1) If \(E\) and \(F\) are independent, then \((E, F^{c}), (E^{c}, F), (E^{c}, F^{c})\) are also independent. It could be extended to more general case: if \(E_{1}, E_{2}, \cdots, E_{n}\) are independent if for every subset with size \(2\leq r\leq n\) of these events, \(P(E_{i_{1}}E_{i_{2}}\cdots E_{i_{r}}) = P(E_{i_{1}})\cdots P(E_{i_{r}})\).

(2) If events \(E_{1}, E_{2}, \cdots, E_{n}\) are independent and \(P(E_{i}) = p_{i}\), the probability of at least one of the events occur is \(1 - (1 - p_{1})(1 - p_{2})\cdots (1 - p_{n})\).

4. Random Variables

A random variable \(X\) is a function that map from sample space \(S\) to real numbers, \(X: S\rightarrow R\). The distribution of random variable \(X\) is the collection of probabilities of form \(P(X\in C)\) for all sets \(C\) of real numbers such that \({X\in C}\) is an event.

Probability Mass Function / Probability Density Function

For discrete random variables, PMF gives the probability that a random variables takes on value \(x\); for continous random variables, PDF is the derivative of CDF, the probability of taking single point is 0, probabilities is great 0 for intervals:
\[
P(a\leq X\leq b) = P(X\leq b) - P(X\leq a).
\]

Cumulative Distribution Function

CDF gives the probability that a random variable is less than or equal to \(x\), \(F(x) = P(X\leq x)\). It is right continuous and non-decreasing function, which means it take intervals like “[, )”, i.e.

\[
\lim_{x\rightarrow a^{+}} F(x) = F(a).
\]

Some properties of CDF (for continuous random variables):

\[
P(a < X \leq b) = F(b) - F(a), ~P(X < b) = F(b^{-}).
\]

Quantile Function

Quantile function is the inverse of CDF. For \(p\in [0, 1]\), \(F^{-1}(p) = \inf{x; F(x)\geq p}\), i.e. quantile function returns minimum \(x\) such that \(F(x)\geq p\).

Joint Distributed Random Variables

The CDF is defined as \(F_{XY}(a, b) = P(X\leq a, Y\leq b)\) for \(a, b\in R\),
\[
\begin{aligned}
F_{XY}(-\infty, b) =& \lim_{a\rightarrow -\infty}P(X\leq a, Y\leq b) = 0 \\
F_{XY}(a, -\infty) =& \lim_{b\rightarrow -\infty}P(X\leq a, Y\leq b) = 0 \\
F_{XY}(a, \infty) =& \lim_{b\rightarrow \infty} P(X\leq a, Y\leq b) = P(X\leq a) = F_{X}(a) \\
F_{XY}(\infty, \infty) =& \lim_{a\rightarrow \infty,\ b\rightarrow \infty} P(X\leq a, Y\leq b) = 1,
\end{aligned}
\]

also,

\[
P(a_{1}<X\leq a_{2}, b_{1}< Y\leq b_{2}) = F(a_{2}, b_{2}) - F(a_{2}, b_{1}) - F(a_{1}, b_{2}) + F(a_{1}, b_{1}),
\]

The PMF and PDF is defined as

\[
\begin{aligned}
p_{XY}(x, y) =& P(X=x, Y=y) \\
f_{XY}(x, y) =& \frac{\partial^{2}}{\partial x\partial y} F_{XY}(x, y).
\end{aligned}
\]

The conditional CDF and PDF(PMF) is defined as
\[
\begin{aligned}
f_{X|Y}(x|y) =& \frac{f_{XY}(x, y)}{f_{Y}(y)} \\
F_{X|Y}(x|y) =& P(X\leq x|Y=y).
\end{aligned}
\]

Independent Random Variables

\(X\) and \(Y\) are independent if and only if for any set of real number \(A\) and \(B\),
\[
P(X\in A, Y\in B) = P(X\in A)P(Y\in B).
\]

Sum of independent random variables

If \(X\) and \(Y\) are independent, then the PDF of \(Z = X+Y\) is

\[
f_{Z}(z) = \int^{\infty}_{-\infty} f_{X}(z-y)f_{Y}(y) dy.
\]

Distribution of Function of Random Variable

Discrete case is very simple.

For continuous case, suppose \(g(X)\) is a strict monotone and differentiable function of \(X\), then the PDF of \(Y = g(X)\) is given by

\[
f_{Y}(y) = f_{X}\left( g^{-1}(y)\right)\left|\frac{d}{dy}g^{-1}(y) \right|.
\]

Example: \(Y=X^{2}\),

\[
\begin{aligned}
F_{Y}(y) =& P(Y\leq y) = P(X^{2}\leq y) = P(-\sqrt{y}\leq X\leq \sqrt{y}) \\
=& F_{X}(\sqrt{y}) - F_{X}(-\sqrt{y}),
\end{aligned}
\]
\[
f_{Y}(y) = \frac{1}{2\sqrt{y}}f_{X}(\sqrt{y}) + \frac{1}{2\sqrt{y}}f_{X}(-\sqrt{y}).
\]

Joint Distribution of Function of Random Variables

The derivatives are denoted by Jacobian matrix.

Distribution Function of Maximum and Minimum of Random Sample

The distribution of \(Y_{n} = \max(X_{1},\cdots, X_{n})\) and \(Y_{1} = \min(X_{1},\cdots, X_{n})\):

\[
\begin{aligned}
G_{n}(y) =& P(Y_{n}\leq y) = P(X_{1}\leq y,\cdots, X_{n}\leq y) = F_{X}^{n}(y) \\
G_{1}(y) =& P(Y_{1}\leq y) = 1 - P(Y_{1}>y) = 1 - P(X_{1}> y, \cdots, X_{n}>y) = 1 - (1 - F_{X}(y))^{n}.
\end{aligned}
\]

5. Characteristics of Random Variables

Expected Values / Expectations

Expectation measure the center of the probability distribution. Expectation can be infinity, which is regarded as does not exist.

\[
E(X) = \sum_{k}x_{k}p(x_{k}) \text{ or } E(X) = \int^{\infty}_{-\infty}xf(x)dx.
\]

The expectation of function of random variable:

\[
E(g(X)) = \sum_{k}g(x_{k})p(x_{k}) \text{ or } E(g(X)) = \int^{\infty}_{-\infty}g(x)f(x)dx.
\]

Properties of expectation:

(1) If \(a\) and \(b\) are constants, then \(E(aX + b) = aE(X) + b\).
(2) If there exists a constant \(a\) such that \(P(X\geq a) = 1\), then \(E(X)\geq a\).

Some theorems of expectation:

(1) If \(X\) is a random variable that take non-negative integers, then
\[
E(X) = \sum^{\infty}_{i=1}P(X\geq i).
\]
(2) If \(X\) is a non-negative random variable with CDF \(F(x)\), then
\[
E(X) = \int^{\infty}_{0}(1 - F(x))dx.
\]
(3) If \(X_{1},X_{2},\cdots,X_{n}\) are \(n\) random variables such that \(E(X_{i})\) is finite, then
\[
E(X_{1}+X_{2}+\cdots+X_{n}) = E(X_{1}) + E(X_{2}) + \cdots + E(X_{n}).
\]
(4) If \(X\) and \(Y\) are independent, then for function \(h\) and \(g\), then
\[
E[h(X)g(Y)] = E[h(X)]E[g(Y)].
\]

Variance

Variance is a measure of spread of probability distribution.

\[
Var(X) = E\left(X - E(X)\right)^{2} = E\left(X^{2}\right) - E(X)^{2}.
\]

If \(a\) and \(b\) are constant, then \(Var(aX + b) = a^{2}Var(X)\).

Covariance and Correlation

Covariance is a measure of the strength of relationship between two random variables. Correlation is the standardized version of covariance.

\[
Cov(X, Y) = E[(X - E(X))(Y - E(Y))] = E(XY) - E(X)E(Y).
\]

Condition Expectation and Variance

The conditional expectation of \(X\) given \(Y=y\) is

\[
E(X|Y=y) = \sum_{x}xP(X=x|Y=y) = \sum_{x}xp_{X|Y}(x|y) \text{ or } E(X|Y=y) = \int^{\infty}_{-\infty}xf_{X|Y}(x|y)dx.
\]

Notice that \(E(X|Y=y)\) is a function of \(y\). And

\[
E(X) = E(E(X|Y)), Var(X) = E(Var(X|Y)) + Var(E(X|Y)).
\]