Bangda Sun

Practice makes perfect

Time Series Models Overview (2) - State Space Model and Kalman Filter

State Space Models and Kalman Filter.

1. Introduction

In Time Series Models Overview (1), I briefly go through the ARIMA Family and I use its general case SARIMAX in the work. In practice, I use SAIRMAX() in statsmodels to build the model. When I tried the simplest model - just include one lag term, and then compare the results with LinearRegression() in sklearn, the results are pretty different, which makes me very confused. Therefore recently I dig in and try to reproduce the results without using statsmodels. And I find SARIMAX model is implemented as State space model and the parameters are estimated by MLE via Kalman filter. So in this post I gonna simply go through it from statistical perspective.

2. State Space Model

State space model introduce a “latent” variable, a.k.a. “state” into the model. Here I just list the model form which I use - linear Gaussian state space model (more general form could be found from many resources). Take the simplest example, the state space model version of \(AR(1)\) is

\[
\begin{aligned}
Z_{t} &= \psi Z_{t-1} + \epsilon_{t}, \\
Y_{t} &= Z_{t} + \eta_{t},
\end{aligned}
\]

the first equation is called system equation, which indicates the transition of the states; the second equation is called observation equation, which indicates how the observations are observed. Where \(Z_{t}\) is the state variable, you can clear see the transition of the state is a \(AR(1)\) model, \(\epsilon_{t}\) is the noise term, a.k.a. system noise; \(Y_{t}\) is the observed variable, since state variable is assumed to be latent and unobservable, \(\eta_{t}\) is the observation noise. Assume \(\epsilon_{t}\sim N(0, \sigma_{\epsilon}^{2})\) and \(\eta_{t}\sim N(0, \sigma_{\eta}^{2})\).

The advantage of using state space model form is it is very flexible, it could represent both linear and non-linear model form, could easy to extend to multiple inputs/outputs.

3. Kalman Filter

To estimate the state space model, we need Kalman filter, which will recursively computes the estimates of \(Z_{t}\) conditional on the observation series \(Y_{1},Y_{2},\cdots, Y_{t}\). Kalman filter could give the optimal estimate of the unobservable state variable under the assumption that the state space model is linear Gaussian model.

Some notations:

\[
\begin{aligned}
Z_{t|t-1} &= E\left(Z_{t}|Y_{1},\cdots, Y_{t-1}\right), \\
\sigma_{Z, t|t-1}^{2} &= Var\left(Z_{t}|Y_{1},\cdots, Y_{t-1}\right), \\
Z_{t|t} &= E\left(Z_{t}|Y_{1},\cdots, Y_{t-1}, Y_{t}\right), \\
Y_{t|t-1} &= E\left(Y_{t}|Y_{1},\cdots, Y_{t-1}\right),
\end{aligned}
\]

with initial value denoted as \(Z_{0|0}\) and \(\sigma_{Z, 0|0}^{2}\). And based on the state space model we have

\[
\begin{aligned}
Z_{t|t-1} &= \psi Z_{t-1|t-1}, \\
\sigma_{Z, t|t-1}^{2} &= \psi^2 \sigma_{Z, t-1|t-1}^{2} + \sigma_{\epsilon}^{2}, \\
Y_{t|t-1} &= Z_{t|t-1}.
\end{aligned}
\]

Then the key step is to update the state estimate after we get new observation \(Y_{t}\) at time \(t\), from Bayesian perspective, we have a prediction at first (prior), then get the updated prediction (posterior) after observe new data, this is what Kalman filter do:

\[
\begin{aligned}
Z_{t|t} &= Z_{t|t-1} + K_{t}\left(Y_{t} - Y_{t|t-1}\right), \\
\sigma_{Z, t|t}^{2} &= \sigma_{Z, t|t-1}^{2} - K_{t}^{2}\left(\psi^{2}\sigma_{Z,t|t-1}^{2} + \sigma_{\eta}^2\right).
\end{aligned}
\]

Here \(K_{t}\) is called Kalman gain, in this case it is

\[
K_{t} = \frac{\sigma_{Z,t|t-1}^{2}}{\sigma_{Z,t|t-1}^{2} + \sigma_{\eta}^{2}}.
\]

So actually you can see, the updated state estimate is the weighted average of prior - which is the conditional expectation of current state given by all historical information, and observation error - which is the difference of conditional expectation of next observation and actual observation, and Kalman gain is the weight.

4. Maximum Likelihood Estimation

To estimate the parameter of the model (\(\psi\)), we need MLE. The likelihood function of \(Y_{1},\cdots, Y_{T}\) is

\[
L(y_{1},\cdots,y_{T}) = f(y_{1})\prod^{T}_{t=1}f(y_{t}|y_{1},\cdots,y_{t-1}).
\]

And \(Y_{t}|Y_{1},\cdots,Y_{t-1}\) is a normal distribution with mean \(Y_{t|t-1}\) and variance \(\sigma^{2}_{Z,t|t-1} + \sigma^{2}_{\eta}\). In practice we usually take logarithm, then the log likelihood function is

\[
\ell(y_{1},\cdots, y_{T}) \propto -\frac{T}{2}\log 2\pi - \frac{1}{2}\sum^{T}_{t=1}\log\left(\sigma^{2}_{Z,t|t-1} + \sigma^{2}_{\eta}\right) - \frac{1}{2}\sum^{T}_{t=1}\frac{\left(y_{t} - z_{t|t-1}\right)^2}{\sigma^{2}_{Z,t|t-1} + \sigma^{2}_{\eta}}.
\]

To get the estimate:

  • initialize \(Z_{0|0}\), \(\sigma_{Z, 0|0}^{2}\)
  • evaluate the likelihood function use Kalman filter
  • find the parameter that minimize the likelihood function

The advantage of using Kalman filter here is since it gives optimal estimates of the state variable, and it could also deal with missing data by change the update rule. More theoretical proof could be found in many Kalman filter and state space model resources.