Summary on sampling algorithms in simulation: Inverse Transformation, Acceptance - Rejection Algorithm, Importance Sampling, MCMC.
1. Inverse Transform Method
If \(X\) is a continuous random variable with CDF \(F\), then \(F(X)\sim Unif(0, 1)\). That means \(X\) can be simulated by drawing random samples \(U\sim Unif(0, 1)\) and feeding into \(F^{-1}\).
This method works for distribution with invertible CDF, and in general works for discrete distribution too, as the CDF functions of discrete distributions are step functions.
2. Acceptance - Rejection Algorithm
Rejection sampling can be used when the CDF is not invertible.
Find a distribution \(g(x)\) that we can directly sample from, then scale it with a constant such that \(Mg(x) \geq f(x)\) for all \(x\), \(Mg(x)\) is also known as an envelope. Then sample \(x\sim g\) from and \(u\sim Unif(0, 1)\), calculate \(f(x)/Mg(x)\). Finally accept \(x\) if \(u \leq f(x)/Mg(x)\), otherwise reject it.
3. Importance Sampling
This method is used to estimate the expectation of expectation of the distribution, where the distribution is not very easy for generating random samples. The goal is to estimate:
\[
E_{p}\left[f(X)\right] = \int f(x)p(x)dx \approx \frac{1}{n}\sum^{n}_{i=1} f\left(x_{i}\right).
\]
where \(p(x)\) is hard to simulate.
Instead of generating samples from \(p(x)\), we can generating samples from an entirely different distribution \(q(x)\):
\[
E_{p}\left[f(X)\right] = \int f(x)p(x)dx = \int f(x)\frac{p(x)}{q(x)}q(x)dx = E_{q}\left[\frac{f(x)p(x)}{q(x)}\right].
\]
Then we can compute \(f(x)p(x)/q(x)\) and estimate \(E_{p}\left[f(X)\right]\) by
\[
E_{p}\left[f(X)\right] \approx \frac{1}{n}\sum^{n}_{i=1}f\left(x_{i}\right)\frac{p\left(x_{i}\right)}{q\left(x_{i}\right)}.
\]
4. MCMC (Markov Chain Monte Carlo)
If the target distribution is the stationary distribution of a Markov chain, we can draw the random samples from the target distribution by running the Markov chain, this is how MCMC works.
For a Markov chain with transition matrix \(P\), the stationary distribution is \(\pi\). The definition of stationary indicates that for each two states \(i, j\) there is \(\pi(i)P(i, j) = \pi(j)P(j, i)\) (denote \(P(i, j) = P(X=j|X=i)\), the probability from state \(i\) to state \(j\)), therefore \(\pi P = \pi\). Where \(\pi\) is the target distribution, and we need to find \(P\) to draw samples.
It is not easy to get \(P\) directly. Assume there is another transition matrix \(Q\), where \(\pi(i)Q(i, j) \neq \pi(j)Q(j, i)\), and we introduce \(a(i, j)\) such that \(\pi(i)Q(i, j)a(i, j) = \pi(j)Q(j, i)a(j, i)\). \(a\) is known as rejection kernel* in \([0, 1]\). We can construct \(a\) by setting
\[
a(i, j) = \pi(j)Q(j, i), ~a(j, i) = \pi(i)Q(i, j).
\]
Finally let \(P(i, j) = Q(i, j)a(i, j)\).
Therefore the general framework of MCMC is:
- set up \(Q\), \(\pi\), stationary threshold \(n_{1}\) and required sample size \(n_{2}\);
- initialize states \(x_{0}\) randomly;
- for step \(t\) from \(0\) to \(n_{1} + n_{2} - 1\): (1) sample \(x\) from \(Q(x|x_{t})\); (2) generate \(u\sim Unif(0, 1)\); (3) if \(u < a(x_{t}, x) = \pi(x)Q(x, x_{t})\), accept the transition, i.e. \(x_{t+1} = x\); (3) otherwise set \(x_{t+1} = x_{t}\);
- output samples \(\left(x_{n_{1}}, x_{n_{1}+1}, \cdots, x_{n_{1}+n_{2}-1}\right)\).
4.1 Metropolis - Hastings Method
Used to solve high rejection rate problem. Set the rejection kernel as
\[
a(i, j) = \min\left(1, \frac{\pi(j)Q(j, i)}{\pi(i)Q(i, j)}\right).
\]
4.2 Gibbs Sampling
No rejection needed, the sampling is done per-coordinate: sample on one dimension and fix all other dimensions. Gibbs sampling cannot be used to sample univariate distribution.
Gibbs sampling originated from the observation that the transition between two points satisfies the stationary condition when all dimensions are fixed except one (full condition).