I found interesting that the relationship among linear algebra, calculus, and a simple Kalman filter. So I will make a memo about this.
Given a simple observed data sequence (x_i), we can assume these are equally plausible and we can apply the least square method. In this simple setting, the outcome is just an average. You can find this topic in Gilbert Strang's an Introduction to Linear Algebra, chapter 4.2.
For example, the observed data is [70 80 120]^T, if these are equally plausible,
x = 70
x = 80
x = 120.
These euqations are strange. They look like, x is 70 and 80 and 120, simultaneously. Here, these are observed data, so they have some errors. But actually we observe the same data. The motivation is to find the best possible data we could have from the observations.
Therefore, the system is:
We could use an idea from calculus to find the best x. The idea is: the observation has errors and what kind of x can minimize the error. This is called Gauss's least square method. First, we compute the squared error (E). This is also in the Strang's book.
E = (x-70)^2 + (x-80)^2 + (x-120)
E is x^2's equation, therefore this is parabola. That means we can find a minimal point. Such point has 0 tangent, therefore, we could find it with
You can see this is actually an average. The best x is an average, this fits my intuition. However, I did not think about why average is the best in what kind of sense. In this case, variance is relatively small, then, my intuition works. However, if the variance becomes larger, my intuition stops working. I once wrote about this in my blog ``A 6σ Woman.'' Average is the best here in the least square sense.
Next, let's find the best x with linear algebra way.