Why A^{T}A is invertible? (4) Linear Algebra

Appendix A: null space and column space
We use the null space for the proof, therefore, I will explain the null space a bit. If you know about the null space, of course you can skip this entry.

The null space of a matrix A is a set of non-zero vector x that satisfies Ax = 0.
Let me show you an example square matrix A that has null space.
When x \neq 0 , following x is a solution.
Therefore, this x is a null space of A. When an a is an scalar, ax, a \neq 0 are also the solutions. It means these are also null space. In this example, the matrix is singular (and square). If the matrix is not singular, the solution must be 0 only. Because, if a square matrix is not singular, there is the inverse,
Therefore, x = 0. In this case, we say there is no null space.

Let me show you another example, but this time a rectangle matrix A that has null space.
The solution is the same as the last example.

By the way, these are all about definition of null space. I could finish the story about null space at this point, however, I would like to think about the meaning of null space a bit more. I would like to talk about the existence of the inverse matrix, so let's talk about square matrices. Because a rectangle matrix can be interpreted as a matrix that has too many equations or too less equations, there is no inverse for such matrices.

null space is related with a matrix is singular or not, that means the matrix has independent columns or not. Because, if we write down an A with column vector, and multiply with x,

Therefore, Ax is a linear combination of the column vectors a_i with coefficients x_i. If this result is 0, x is a null space. This is one aspect of null space.

When we look at the columns of Ax = b, b is the linear combination of column vector of A. A space consists of column vectors is called column space. If we have n independent columns in n-dimensional space, we could express any point in the n-dimensional space. This means we have a solution for any b and we have the inverse. If A has the inverse, the linear combination of the column vector becomes 0 iff x= 0. This also means there is no null space (= or only 0 exists).

Now we see the null space, column space, and the existence of the inverse. Do you think you have now a bit clearer view about the relationship among them?


Thanks to Marc D. who listened my explanation and gave me comments.


Why A^{T}A is invertible? (3) Linear Algebra

By the way, one of my friend told me my blog is totally ununderstandable. If you share my friend's opinion, and if something I could improve, please let me know. Thanks!

Explanation from Linear combination and basis transformation

Let me explain this from another point of view. This explanation doesn't need null space, but, it lacks a rigorous proof. However, we have a proof by null space and so I think this is correct. For me this is a more intuitive explanation.

From the independence of the columns of A, the independent vectors will be organized as the following. You can also see the how the resulting columns are computed.
Since A^T is a linear operator,
This becomes 0 when A^{T} a_* degenerates. However, this is also the same form of basis transformation. Each basis is each column. Because, A^{T}'s row are independent (since A's column are independent, transposing A exchanges columns with rows.) Each independent row vector is projected to the each independent columns. The column vector's independence is preserved unless A^{T} a_* degenerates.

Unfortunately, I have no proof of degeneration. But, what happens is an independent basis is transformed into another independent basis. I don't see how this independence breaks under this condition.

By the way, this matrix is called metric tensor. My favorite explanation of this is in the book ``Mathematical sciences of graphics'' by Koukichi Sugihara, Chapter 1. (I have the first edition, the third printing. This has a typo in Equation 1.20, page 19. But, the introducing equations are correct and only the result equation has a typo.)

Next I would like to have an appendix, explanation about null space, column space, and existence of inverse. This might not be necessary, so the story about A^T A ended here.


Why A^{T}A is invertible? (2) Linear Algebra

Why A^{T}A has the inverse

Let me explain why A^{T}A has the inverse, if the columns of A are independent. First, if a matrix is n by n, and all the columns are independent, then this is a square full rank matrix. Therefore, there is the inverse. So, the problem is when A is a m by n, rectangle matrix.  Strang's explanation is based on null space. Null space and column space are the fundamental of the linear algebra. This explanation is simple and clear. However, when I was a University student, I did not recall the explanation of the null space in my linear algebra class. Maybe I was careless. I regret that...

Explanation based on null space

This explanation is based on Strang's book. Column space and null space are the main characters. Let's start with this explanation.

Assume where

x is in the null space of A.  The matrices (A^{T} A) and A share the null space as the following:
This means, if x is in the null space of A, x is also in the null space of A^{T} A. If x = 0 is only the possible case, A^{T} A has the inverse since the column space spans the whole dimension of A.

If we can multiply the inverse of A^{T} from the left, but, A is a rectangle matrix, therefore there is no inverse of it. Instead, we can multiply x^T from the left.

x^{T} A^{T} is a row vector, Ax is a column vector. The multiplication of them are an inner product.If Ax = b, then x^{T} A^{T} = (A x)^{T} = b^{T}, b^{T} b = 0. (Note, the last 0 is not a vector, a scalar) Inner product of the identical vectors become 0 if and only if 0. Since the inner product is \sum (b_i)^2 = 0 (squared sum = 0).

This is a nice explanation. We can also use the independence of A's columns, this concludes null space has only 0. A^{T}A shares the null space with A, this means A^{T}A's columns are also independent. Also, A^{T}A is a square matrix. Then, A^{T}A is a full rank square matrix. The columns of A are independent, but, it doesn't span the m-dimensional space since A is a rectangle matrix. Instead, the columns of A^{T}A span the n-dimensional space. Therefore, there is the inverse.

I would like to add one point. Assume B where A \neq B,

Therefore, I first thought B and A share the null space. It's wrong. Because,
This means only two vectors: a = (x^{T} B) and b = (A x) are perpendicular. It doesn't mean (x^{T} B) = 0. The transpose of this is the following.

We actually don't know x^{T} B is 0. Therefore, we don't know x is in the left null space of B or not.  A and A^{T}A share the nulls pace, but, given arbitrary B, B and BA usually don't share the null space. In the Strang's book, this is not mentioned. Maybe it is too obvious, but, I misunderstand it at the first time.

Next time, I will explain this another point of view.


Why A^{T}A is invertible? (1) Linear Algebra

We can see matrix multiplication A^{T}A frequently, for example, in the least square method. If matrix A has independent columns, this matrix has the following great properties: it is square, symmetric, and has the inverse. I am interested in why A^{T}A has this properties and would like to explain that. The explanation is based on Gilbert Strang's Introduction of Linear Algebra, 4th edition. The explanation uses null space and quadric form, it is an elegant explanation. But, if I just follow the Strang's explanation, I only need to write: see Strang book. So, I would like to add a geometric explanation. Although, I don't have a rigid proof, it is just a little bit intuitive for me. Yet, I hope you can enjoy that. Even if my explanation is bad, you can still see the Strang's explanation.

Properties of A^{T}A

We can see matrix multiplication A^{T}A frequently. We can see this pattern in a projection operation, least square computation (Actually, those two are the same).

In general, we can assume the following assumptions.

  • When A is not a square matrix and would like to know the least square solution. In this case, the matrix is m by n and m > n. This means, the number of equations is larger than the unknowns. When this happens? For example, you observe some signals or take some statistics, like take a photograph to analyze something. You sometimes take samples exact the same, but the result might differ because of observation error. In this case, you have more samples than the parameters. If m < n case, you can still take more samples. So, m by n is a general case.
  • We could make the columns of A independent. If not, we can trash such columns. If we don't know how many parameters are in the model and remove too many columns, we might have a wrong answer. This is rather a technical issue, it is just possible. This just says, we can do a kind of best effort.

If the second assumption, independent columns, is true, A^{T}A has the following nice properties:

  • Square
  • Symmetric
  • Existence of inverse

Therefore, this matrix is useful.

I would like to stress that the column independence of A is necessary. Otherwise, there is no inverse of A^{T}A. You can easily see the following.

It is rather easy to see why A^{T}A is square matrix, since [n by m] [m by n] is [n by n] because of the matrix multiplication rule.

You can also see the following shows the symmetric property.

Where A^{{T}^{T}} = A, transpose of transpose is the original matrix. The question is existence of the inverse.