Bangda Sun

Practice makes perfect

Probability Review (1)

Probability Quick Review & Cheatsheet (1): Probability Axiom, Set Calculation, Counting Methods, Conditional Probability, Random Variable and its Characteristics.

Only the most fundamental probability theories, no measure theories.

1. Preliminary

  • Basic Units:

    • Experiment: Any real or hypothetical process, in which the possible outcome can be identified ahead of time.
    • Sample Space: set \(S\), all possible outcomes of an experiment.
    • Event: A well-defined set of possible outcomes of an experiment, i.e. any subset of \(S\), usually denoted by capital letters such as \(A, B, E, F\)
  • Set Theory:

    • Union / Intersection / Complementation.
    • Mutually exclusive (disjoint) sets: intersection of two sets are empty set, \(AB = \varnothing\)
    • Basic Laws:
      • Communicative laws: \(E\cup F = F\cup E\), \(EF=F E\).
      • Associative laws: \((E\cup F)\cup G = E\cup (F\cup G)\), \((E F)G = E(F G)\).
      • Distributive laws: \((E\cup F) G = (E G)\cup(FG)\), \((E F)\cup G = (E\cup G)(F\cup G)\).
      • De Morgan’s laws: \((E\cup F)^{c}=E^{c}F^{c}\), \((EF)^{c} = E^{c}\cup F^{c}\).
  • Probability Definition

    • Classical: \(n\) equally likely outcomes of which \(m\) outcomes are wanted, the probability of wanted events happen is \(m/n\).

    • Relative Frequency: repeat trials a large number of times with same condition, the probability of certain event is approximately the ratio of the event count and total trial count.

    • Axiom: consider an experiment where sample space is \(S\), for each event \(E\) in the sample space, the probability of event \(P(E)\) is defined and satisfies:

      (1) \(0 \leq P(E) \leq 1\).

      (2) \(P(S) = 1\).

      (3) for mutually exclusive events \(E_{1}, E_{2}, \cdots, \),

      \[
      P\left(\bigcup^{\infty}_{i=1}E_{i}\right) = \sum^{\infty}_{i=1}P(E_{i}).
      \]

  • Propositions of Probability

    • For empty set / event, \(P(\varnothing) = 0\).

    • For finite sequence of mutually exclusive events \(E_{1}, E_{2}, \cdots, E_{n}\),

      \[
      P\left(\bigcup^{n}_{i=1}E_{i}\right) = \sum^{n}_{i=1}P(E_{i}).
      \]

    • If \(E\subset F\), then \(P(E)\leq P(F)\).

    • \(P(E\cup F) = P(E) + P(F) - P(EF)\).
  • Sum of Geometric Series

Suppose there is a geometric series \(a, ar, ar^{2}, \cdots, ar^{n-1}\), where \(r\neq 1\), the sum of first \(n\) terms is

\[
a + ar + ar^{2} + \cdots + ar^{n-1} = \sum^{n-1}_{k=0}ar^{k} = a\left(\frac{1-r^{n}}{1-r}\right).
\]

2. Counting Methods

Counting methods are useful to calculate probabilities of equally likely events, like coin, dice, poker, some games.

  • Sum Rule

Tasks cannot be done at same time, i.e. sequential tasks, can be applied with sum rule.

  • Product Rule

Successive tasks with different options at different stages.

  • Permutations

Ordered arrangement of object in a set. The number of different permutations of \(n\) distinct object is \(n\) factorial, \(n!\). If there are\(m\) indistinguishable objects, they need to be compensated by dividing \(m!\).
The circular permutation of n distinct objects is \((n-1)!\).

  • Combinations

Unordered arrangement of \(r\) objects from a set of \(n\) distinct objects, denoted by
\[
{n \choose r } = \frac{n!}{(n-r)!}.
\]

  • Binomial Theorem

\[
(x + y)^{n} = \sum^{n}_{k=0}{n \choose k}x^{k}y^{n-k}.
\]

Two applications:
(1) arrange \(a\) similar red balls and \(b\) similar black balls, number of distinguishable ways to order in a row is
\[
{a + b \choose a} = {a + b \choose b}.
\]
(2) arrange \(n\) indistinguishable objects into \(k\) distinguishable sets (\(n\) balls and \(k-1\) separators) is
\[
{n+k-1 \choose n} = {n+k-1 \choose k-1}.
\]

Two famous problems:

(1) Birthday Problem
(2) Matching Problem

3. Conditional Probabilities

Sample space is reduced for conditional probabilities by given additional information. The conditional probability of \(E\) given \(F\) is denoted by \(P(E|F)\), if \(P(F) > 0\), then

\[
P(E|F) = \frac{P(EF)}{P(F)}.
\]

Three rules for conditional probabilities:

(1) Multiplication Rule (Chain Rule):

\[
P(E_{1}E_{2}\cdots E_{n}) = P(E_{n}|E_{1}E_{2}\cdots E_{n-1})\cdots P(E_{2}|E_{1})P(E_{1}).
\]

(2) Law of Total Probability

If \(F_{1}, F_{2}, \cdots, F_{n}\) are mutually exclusive events such that \(\cup^{n}_{i=1}F_{i} = S\), then
\[
P(E) = \sum^{n}_{i=1}P(EF_{i}) = \sum^{n}_{i=1}P(E|F_{i})P(F_{i}).
\]

(3) Bayes Formula

If \(F_{1}, F_{2}, \cdots, F_{n}\) are mutually exclusive events such that \(\cup^{n}_{i=1}F_{i} = S\), then

\[
P(F_{i}|E) = \frac{P(E|F_{i})P(F_{i})}{\sum^{n}_{j=1} P(E|F_{j}) P(F_{j})}.
\]

  • Independent Events

Two events \(E\) and \(F\) are statistically independent if \(P(EF) = P(E)P(F)\).

Two theorems of independent events:

(1) If \(E\) and \(F\) are independent, then \((E, F^{c}), (E^{c}, F), (E^{c}, F^{c})\) are also independent. It could be extended to more general case: if \(E_{1}, E_{2}, \cdots, E_{n}\) are independent if for every subset with size \(2\leq r\leq n\) of these events, \(P(E_{i_{1}}E_{i_{2}}\cdots E_{i_{r}}) = P(E_{i_{1}})\cdots P(E_{i_{r}})\).

(2) If events \(E_{1}, E_{2}, \cdots, E_{n}\) are independent and \(P(E_{i}) = p_{i}\), the probability of at least one of the events occur is \(1 - (1 - p_{1})(1 - p_{2})\cdots (1 - p_{n})\).

4. Random Variables

A random variable \(X\) is a function that map from sample space \(S\) to real numbers, \(X: S\rightarrow R\). The distribution of random variable \(X\) is the collection of probabilities of form \(P(X\in C)\) for all sets \(C\) of real numbers such that \({X\in C}\) is an event.

  • Probability Mass Function / Probability Density Function

For discrete random variables, PMF gives the probability that a random variables takes on value \(x\); for continous random variables, PDF is the derivative of CDF, the probability of taking single point is 0, probabilities is great 0 for intervals:
\[
P(a\leq X\leq b) = P(X\leq b) - P(X\leq a).
\]

  • Cumulative Distribution Function

CDF gives the probability that a random variable is less than or equal to \(x\), \(F(x) = P(X\leq x)\). It is right continuous and non-decreasing function, which means it take intervals like “[, )”, i.e.

\[
\lim_{x\rightarrow a^{+}} F(x) = F(a).
\]

Some properties of CDF (for continuous random variables):

\[
P(a < X \leq b) = F(b) - F(a), ~P(X < b) = F(b^{-}).
\]

  • Quantile Function

Quantile function is the inverse of CDF. For \(p\in [0, 1]\), \(F^{-1}(p) = \inf{x; F(x)\geq p}\), i.e. quantile function returns minimum \(x\) such that \(F(x)\geq p\).

  • Joint Distributed Random Variables

The CDF is defined as \(F_{XY}(a, b) = P(X\leq a, Y\leq b)\) for \(a, b\in R\),
\[
\begin{aligned}
F_{XY}(-\infty, b) =& \lim_{a\rightarrow -\infty}P(X\leq a, Y\leq b) = 0 \\
F_{XY}(a, -\infty) =& \lim_{b\rightarrow -\infty}P(X\leq a, Y\leq b) = 0 \\
F_{XY}(a, \infty) =& \lim_{b\rightarrow \infty} P(X\leq a, Y\leq b) = P(X\leq a) = F_{X}(a) \\
F_{XY}(\infty, \infty) =& \lim_{a\rightarrow \infty,\ b\rightarrow \infty} P(X\leq a, Y\leq b) = 1,
\end{aligned}
\]

also,

\[
P(a_{1}<X\leq a_{2}, b_{1}< Y\leq b_{2}) = F(a_{2}, b_{2}) - F(a_{2}, b_{1}) - F(a_{1}, b_{2}) + F(a_{1}, b_{1}),
\]

The PMF and PDF is defined as

\[
\begin{aligned}
p_{XY}(x, y) =& P(X=x, Y=y) \\
f_{XY}(x, y) =& \frac{\partial^{2}}{\partial x\partial y} F_{XY}(x, y).
\end{aligned}
\]

The conditional CDF and PDF(PMF) is defined as
\[
\begin{aligned}
f_{X|Y}(x|y) =& \frac{f_{XY}(x, y)}{f_{Y}(y)} \\
F_{X|Y}(x|y) =& P(X\leq x|Y=y).
\end{aligned}
\]

  • Independent Random Variables

\(X\) and \(Y\) are independent if and only if for any set of real number \(A\) and \(B\),
\[
P(X\in A, Y\in B) = P(X\in A)P(Y\in B).
\]

  • Sum of independent random variables

If \(X\) and \(Y\) are independent, then the PDF of \(Z = X+Y\) is

\[
f_{Z}(z) = \int^{\infty}_{-\infty} f_{X}(z-y)f_{Y}(y) dy.
\]

  • Distribution of Function of Random Variable

Discrete case is very simple.

For continuous case, suppose \(g(X)\) is a strict monotone and differentiable function of \(X\), then the PDF of \(Y = g(X)\) is given by

\[
f_{Y}(y) = f_{X}\left( g^{-1}(y)\right)\left|\frac{d}{dy}g^{-1}(y) \right|.
\]

Example: \(Y=X^{2}\),

\[
\begin{aligned}
F_{Y}(y) =& P(Y\leq y) = P(X^{2}\leq y) = P(-\sqrt{y}\leq X\leq \sqrt{y}) \\
=& F_{X}(\sqrt{y}) - F_{X}(-\sqrt{y}),
\end{aligned}
\]
\[
f_{Y}(y) = \frac{1}{2\sqrt{y}}f_{X}(\sqrt{y}) + \frac{1}{2\sqrt{y}}f_{X}(-\sqrt{y}).
\]

  • Joint Distribution of Function of Random Variables

The derivatives are denoted by Jacobian matrix.

  • Distribution Function of Maximum and Minimum of Random Sample

The distribution of \(Y_{n} = \max(X_{1},\cdots, X_{n})\) and \(Y_{1} = \min(X_{1},\cdots, X_{n})\):

\[
\begin{aligned}
G_{n}(y) =& P(Y_{n}\leq y) = P(X_{1}\leq y,\cdots, X_{n}\leq y) = F_{X}^{n}(y) \\
G_{1}(y) =& P(Y_{1}\leq y) = 1 - P(Y_{1}>y) = 1 - P(X_{1}> y, \cdots, X_{n}>y) = 1 - (1 - F_{X}(y))^{n}.
\end{aligned}
\]

5. Characteristics of Random Variables

  • Expected Values / Expectations

Expectation measure the center of the probability distribution. Expectation can be infinity, which is regarded as does not exist.

\[
E(X) = \sum_{k}x_{k}p(x_{k}) \text{ or } E(X) = \int^{\infty}_{-\infty}xf(x)dx.
\]

The expectation of function of random variable:

\[
E(g(X)) = \sum_{k}g(x_{k})p(x_{k}) \text{ or } E(g(X)) = \int^{\infty}_{-\infty}g(x)f(x)dx.
\]

Properties of expectation:

(1) If \(a\) and \(b\) are constants, then \(E(aX + b) = aE(X) + b\).
(2) If there exists a constant \(a\) such that \(P(X\geq a) = 1\), then \(E(X)\geq a\).

Some theorems of expectation:

(1) If \(X\) is a random variable that take non-negative integers, then
\[
E(X) = \sum^{\infty}_{i=1}P(X\geq i).
\]
(2) If \(X\) is a non-negative random variable with CDF \(F(x)\), then
\[
E(X) = \int^{\infty}_{0}(1 - F(x))dx.
\]
(3) If \(X_{1},X_{2},\cdots,X_{n}\) are \(n\) random variables such that \(E(X_{i})\) is finite, then
\[
E(X_{1}+X_{2}+\cdots+X_{n}) = E(X_{1}) + E(X_{2}) + \cdots + E(X_{n}).
\]
(4) If \(X\) and \(Y\) are independent, then for function \(h\) and \(g\), then
\[
E[h(X)g(Y)] = E[h(X)]E[g(Y)].
\]

  • Variance

Variance is a measure of spread of probability distribution.

\[
Var(X) = E\left(X - E(X)\right)^{2} = E\left(X^{2}\right) - E(X)^{2}.
\]

If \(a\) and \(b\) are constant, then \(Var(aX + b) = a^{2}Var(X)\).

  • Covariance and Correlation

Covariance is a measure of the strength of relationship between two random variables. Correlation is the standardized version of covariance.

\[
Cov(X, Y) = E[(X - E(X))(Y - E(Y))] = E(XY) - E(X)E(Y).
\]

  • Condition Expectation and Variance

The conditional expectation of \(X\) given \(Y=y\) is

\[
E(X|Y=y) = \sum_{x}xP(X=x|Y=y) = \sum_{x}xp_{X|Y}(x|y) \text{ or } E(X|Y=y) = \int^{\infty}_{-\infty}xf_{X|Y}(x|y)dx.
\]

Notice that \(E(X|Y=y)\) is a function of \(y\). And

\[
E(X) = E(E(X|Y)), Var(X) = E(Var(X|Y)) + Var(E(X|Y)).
\]