Kalman Filter is just Bayes Filter applied to a Gaussian linear setting.

Settings

First we have a state transition:

‘s mean is and the covariance is .

Then we have the measurements:

Again, measurement noise has mean and covariance .

The algorithm

Here line 2 and 3 are calculating , while 5 and 6 are calculating .

My own intuition

  • : That’s easy from state transition formula
  • : The original covariance goes through the linear formula, since the covariance is a quadratic matrix, we must have multiplied twice. Add the term for state transition noise.

On Kalman Gain

Kalman Gain: Let’s write this as

. That’s strange (and nobody pointed that out), cause obviously to make sense of it we need another there on the numerator part.

Let’s just say, is intuitively,

. Of course there’s no guarantee that is invertible. But let’s just keep it that way.

Now onto the next formula. We compute the innovation: the difference of “real measurement” and “expected measurement”, . Recall that . If we just multiply to both side, we get . So here we are,

Voilà!

For the last one,

Why? If , that means measurement noise is too large, and we rely only on state update. If , measurement is so good we just need that, and we are super certain.


The derivation

The following notes are from Gauss-Markov Models, a supplementary material from 16-831, F14. It’s clearer than the version in the book Probabilistic Robotics.

Say we have a vector ,

Linear transformation, , is easier with moment parameterization. .

Conditioning, getting from joint distribution , is easier with natural parameterization. Note that conditioning is basically start from joint distribution and then treat as “known”.

If we want to multiply two likelihood function, , then posterior can be simply computed by

Now with these in mind, we can have a gauss-markov model. I’m too tired now to repeat the stuff in the PDF. But the general idea is we do it in two steps. One prediction / rollup, one conditioning.

The former computes , and relies on . The latter is and relies on the previous formula. The first step is easier in moment parameters, while the latter is easier in natural one.

If we use Sherman-Morrison-Woodbury formula and convert the natural parameterization to moment one, we get our familiar Kalman filter, which is an algorithm, not the underlying probabilistic model.

The "information" is Fisher information

The precision matrix in the natural parameterization is the Fisher Information about the state given the current belief. The measurement update is adding the Fisher information from the new observation. This is why conditioning is additive in natural parameters — Fisher information from independent observations adds. The Kalman filter achieves the Cramér-Rao bound: is the minimum-variance estimator, and is the total accumulated Fisher information.