2012-12-25

Authors in a Markov matrix: Which author do people find most inspiring? (23)


Again, at which station am I?

In the last section, I explained eigenanalysis using inhabitant change of two cities. Now I would like to extend this method involving more objects instead of two. Let's back to the Berlin S-Bahn station graph of Figure 7.
Graph example 2. Each node is a train station.

Same as moving around cities, people can be moving around the stations. Let's assume a person choose the next station equal possibility. By this assumption, how the people moving around is defined by the connection of the stations. Namely, the people's staying possibility at each station depends on the train station topology. We can generate the matrix that represents how the people moving around from the station adjacency matrix. If a station connected two stations, the possibility of each connection is used is \(\frac{1}{2}\) each since we assume a person choose the next station equal possible. If a station connected two stations, the possibility of each connection is used is \(\frac{1}{3}\). This is because we assumed so. We can of course use other model, but I would like to stick this assumption here. This is achieved normalizing the column vector by \(L_1\) of the adjacency matrix. (It's a bit detail, however, we use \(L_1\) norm here since this is probability.)
\begin{eqnarray*}
 \left[
  \begin{array}{cccc}
   1 & 1 & 0 & 0 \\
   1 & 1 & 1 & 1 \\
   0 & 1 & 1 & 0 \\
   0 & 1 & 0 & 1 \\
  \end{array}
 \right]
 \rightarrow
 \left[
  \begin{array}{cccc}
   0.5 & 0.25 & 0   & 0 \\
   0.5 & 0.25 & 0.5 & 0.5 \\
   0   & 0.25 & 0.5 & 0 \\
   0   & 0.25 & 0   & 0.5 \\
  \end{array}
 \right]
\end{eqnarray*}
This is the Markov Matrix of S-Bahn generated by the adjacency matrix.

Let's compute the each station's staying possibility by octave.
octave:10> Mb =
[0.5 0.25 0 0; 0.5 0.25 0.5 0.5;
 0 0.25 0.5 0; 0 0.25 0 0.5];
octave:11> [L D] = eig(Mb)
L =
2.88e-01  8.16e-01 -3.77e-1  1.25e-01
-8.66e-1 -4.83e-16 -7.55e-1 -3.25e-16
2.88e-01 -4.08e-01 -3.77e-1 -7.61e-01
2.88e-01 -4.08e-01 -3.77e-1  6.35e-01
D =
Diagonal Matrix
  -0.2500       0       0       0
        0  0.5000       0       0
        0       0  1.0000       0
        0       0       0  0.5000
octave:12> x1 = L(:,3)/ sum(L(:,3))
x1 =
   0.20000
   0.40000
   0.20000
   0.20000
Here I use third column vector as the eigenvector, since the third eigenvalue is one. Remember, the eigenvalue one is the important here. For a nowadays computer, 1000 times matrix multiplication is also possible for this small size of matrix. So we can compute it also.
octave:16> Mb^1000 * w
   0.20000
   0.40000
   0.20000
   0.20000
After 1000 steps, people's staying probability of each station is 0.2, 0.4, 0.2, 0.2. Even if we start any distribution of the people, the distribution of the people becomes this numbers after enough long steps. However, we have already know this by eigenanalysis. If you recall the topology of the stations (Figure 7), the second station, Alexanderplatz, has twice people than other stations. In this example, Alexanderplatz is a hub station. We can compute how many people stay in Alexanderplatz compare to the other stations.

Finally, we saw all the theoretical background of author network analysis using station connections. Then, what is the relationships between station connections and authors' importance. We have seen both are represented by a graph. If we had a graph representation of any relationships, they became the same mathematical entity: to find which station is the important station as the visiting possibility and to find which author is the important author as the visiting possibility on the web. Human interpretations are different between those, but the mathematical representation are the same. It is similar to \(2+3=5\) can be interpreted as any quantity: milk, time, or Euro. We can add 2 litters milk and 3 litters milk, which is totally 5 litters, we can add 2 hours and 3 hours, which is totally 5 hours, and we can add 2 Euros and 3 Euros, which is totally 5 Euros. It's just a human interpretation to see a graph as a station network or a author network.

I know some people don't like this ``abstraction'' that removes some concrete meaning since it is a kind of inhuman. I understand this. There is a similar operation that converts all values to money. I personally don't like to convert any value to one global uniform value ``money'' removing all the other subtle values. Once you convert to this abstract value, you can convert the value to add or to subtract, your life value, your children's value are converted to some amount of money and you can compute how much milk is equivalent to them. This is an extreme example that doesn't work. At least I don't want to exchange my life to some amount of milk. But if we properly use this abstraction, we can apply many ideas to solve many ideas. Even we can use some ideas to solve a completely new problem that no one has never encountered. This makes mathematics an important tool. I believe that it is important to have a balance sense to how much abstraction we can use to solve a problem. If mathematicians ignored some substance of the problem, their abstraction is no longer representing the problem. It is the same problem that we assume the value of person's life can be replace with the value of milk through the money. Though I like to drink milk every day. I think how to solve a problem using mathematics is human activity. The humanity of (applied) mathematician became important. I believe that mathematics and humanity are related. Personally, it is quite interesting to me that the quality of mathematics seems correlated with on the quality of the person who does the mathematics.

No comments: