Authors in a Markov matrix Part 2 (7) Experimental results: Which author do people find most inspiring?

We have seen the eigenanalysis results of the authors in Wikipedia. I found it is interesting just going through the result tables, and thinking why this person is there. For instance, I was surprised Winston Churchill is in the high rank in the English literature. But a few of my friends pointed me out that he is the only the Nobel prize winner of literature and the prime minister of Britain. Following some articles, I would like to discuss about the results.


Matrix rank

Table 3 shows that the matrix is not full rank even we removed sink rank pages and out going link only pages. This means there are some groups. These group inside there are connection by links, but between these groups have no links. It is interesting to analyze these groups, but, this will be a future work.

Japanese Wikipedia template bias

Our first PageRank result of Japanese Wikipedia surprised us. Because, Sōseki Natume, Ryūnosuke Akutagawa, Yukio Mishima, Ōgai Mori are all under 100 rank. German Wikipedia result and English Wikipedia result have some similarity, but it seems there is no similarity between Japanese Wikipedia result and other two results. We looked into the result, first we realized the high rank authors are all recent authors, specifically, they are all working after 1930. We first thought the recent authors are more actively edited and updated by the Wikipedia writers. Then, we found all the Akutagawa award winner has high rank. Akutagawa award is a prestigious award, but, we don't understand why Akutagawa himself is too behind these winners. Finally, we found out all the Akutagawa winners has the mutual links as shown in Figure 5. All the award winner got incoming links from all the other winners. This makes these winner's PageRank higher. We consider this is an artificial bias since our assumption is Wikipedia writers makes a link when the writer thinks there is a relationship. But this award links are based on Wikipedia editing template of Japanese authors. We removed these award mutual links, which is shown in Table 12.

Figure 5: Award winner cross link bias problem.

For the readers who are interested in this Akutagawa-award mutual link effect, we show the PageRank result that includes Akutagawa-award mutual link in Table 13 (Note 1). With this bias, all the first to 101st ranks are fulled with Akutagawa-award winner and the first non-Akutagawa award winner finally shows up at 102nd rank who is Mishima Yukio.

After post-processing, only the following eight Akutagawa-award winners are in the top 40: 大江健三郎 (ōe Kenzaburō),松本清張 (Matumoto Seichō),吉行淳之介 (Yoshiyuki Jyunnosuke),開高健 (Kaikō Takeshi), 丸谷才一 (Maruya Saiichi),古井由吉 (Furui Yoshikichi),石原慎太郎  (Ishihara Shintarō),安岡章太郎 (Yasuoka Shōtarō).

Figure 13 shows the adjacency matrices with post-processing (top), without post-processing (middle), and the difference of both (bottom). The middle figure shows some kind of a regular pattern. This regular pattern is the award mutual link. The difference shows the regularity clear, though, the difference is not completely regular since there are several mutual-linking awards biases (e.g., Mainichi Genjyutu award).

Figure 6: Adjacency matrices. Japanese authors in ja.wikipedia.org. Top: Removed Navbox bias, Middle: No postprocessing, Bottom: difference (middle - top)

Table 13: Japanese author rank result with Navbox. We think this Navbox causes a bias.

(Note 1): in Table 13, 赤瀬川原平 (Akasegawa Genpei) won the prize as his pen-name 尾辻克彦 (Otuji Katuhiko).

Next article I would like to discuss about a Category problem.


Authors in a Markov matrix Part 2 (6): The result of German authors

The result of Japanese authors

Table 10: Japanese author rank result in de wikipedia.

Table 11: Japanese author rank result in en wikipedia.

Table 12: Japanese author rank result in ja wikipedia.
Next we will discuss about the results.

Authors in a Markov matrix Part 2 (5): The result of German authors

The result of English authors

Table 7: English author rank result in de wikipedia.

Table 8:English author rank result in en wikipedia. 
Table 9: English author rank result in ja wikipedia. (Our page rank implementation can only find 29 valid authors.)

Authors in a Markov matrix Part 2 (4): The result of German authors

Three articles from this, we will show the PageRank(Eigen analysis) result.

The result of German authors

Table 4: German author rank result in de wikipedia. ``en'' is en wikipedia's rank result.

Table 5: German author rank result in en wikipedia.

Table 6: German author rank result in ja wikipedia. (Our page rank implementation can only find 31 valid authors.)


Authors in a Markov matrix Part 2 (3) Experimental results: Which author do people find most inspiring?

In this article, I will explain about our implementation and how the adjacency matrix looks like.


We implemented the following programs:

  • Link_Vector_Extractor: Generate the author vector from the data.
  • Graph_Extractor: Generate the adjacency matrix from the data and the author vector.
  • Page_Rank: Compute page rank.
  • Remapper: Re-map the author vector according to the page rank result.

Our experiment is done with the computer CPU: Intel(R) Core(TM) DUO CPU P8400, 2 Cores, OS: 64bit Linux, Kubuntu 12.04. We used the following software: Python 2.7.3, Beautiful Soup 4.0.2, matlab R2006a, octave 3.2.4.

Adjacency matrix

Figures 2, 3, 4 show the adjacency matrices. In these Figures, blue points represent the connection.
Figure 2: Adjacency matrices. Top to bottom: German authors in de. wikipedia.org, en.wikipedia.org, ja.wikipedia.org.

Figure 3: Adjacency matrices. Top to bottom: English authors in de. wikipedia.org, en.wikipedia.org, ja.wikipedia.org.
Figure 4: Adjacency matrices. Top to bottom: Japanese authors in de. wikipedia.org, en.wikipedia.org, ja.wikipedia.org.
In Figure 2, German author, en.wikipedia.org has a regular pattern.  We haven't check what exactly causes this, however, we think this is the same problem of the template bias (will be discussed later) (Note 1). The German author in en.wikipedia.org has another peculiarity that we can see the higher number of links compare to other Wikipedia data shown in Table 2. We count the number of links is how many links are connected to the author vector entry, not all the links in the page since these are the connection in the adjacency matrix. For example, we didn't count the links to other language pages, self links, author links that were not on the author vector. Therefore, if the page has links to non-author person, we didn't count these links.
Table 2: Matrix size, non zero elements, and the average number of links between authors. Wiki ``en'' means en.wikipedia.org.

We removed rank sink nodes according to the PageRank algorithm [1]. We removed links related rank sink node pages, therefore we also removed so called dangling links (2.7 in [2]), since the links refereeing single sink rank node are the dangling links.  Additionally, we removed pages that has only outgoing links. In the original PageRank paper [2], nodes that have only outgoing links are not mentioned, though this doesn't make sense in the PageRank concept. However, it is easy to imagine the reason why the paper didn't mentioned about them. Since the original paper is for the web, it is hard to determine a node is not linked from any pages. To determine this, all the pages are considered. But, this is practically impossible on the web, therefore it is natural that the original paper didn't consider this outgoing link only nodes. However, our author vector has a limited size and we know all valid pages, therefore, we can remove these nodes. The link normalization value of PageRank calculation depends on whether considering these links or not. This means, it may affect the absolute value of the PageRank. However we are not interested in the absolute value of the PageRank. We are rather interested in the relative value, which author's influence is larger. The PageRank paper also mentioned this normalization has not large effect since they are also interested in the relative influence.

The matrix size is reduced by this operation, the result size is shown in Table 3.
Table 3: Adjacency matrices: original size, reduced size, and its rank.

Finally, I will show you who is the most important author in the next article (in a Wikipedia sense).

Note 1: When I wrote this article in my blog, my friend, Joerg M. pointed out this possibility (2012-12-29). Thanks.

Authors in a Markov matrix Part 2 (2) Experimental results: Which author do people find most inspiring?


The Wikipedia data we have used is shown in Table 1. Fortunately, every Wikipedia has a list of the authors for each language, we use the list as the root page. We downloaded the author pages identified by the root page. We cared the server load for the download, we downloaded data by 15 sec/page to not overload the server.

The page ``石原慎太郎'' in the Japanese Wikipedia was the only compressed page, we expanded the page when we run our analysis tool. We had a choice of the root pages, for instance, we used Liste_britischer_Schriftsteller for English author list in the German Wikipedia instead of Liste_englischsprachiger_Schriftsteller. Which root page we chose is in Table 1. There is no reason these list should be chosen, they are just our choice in this experiment. All the files were downloaded 2012-5-30 for this experiment.

Table 1: Experimental data set


Authors in a Markov matrix Part 2 (1) Experimental results: Which author do people find most inspiring?

This is the part 2 of the article, experimental results. Until the last article, I talked about the question, ``Which author do people find most inspiring?'' From now on, I would like to talk about an answer.

Analyzing relationships between authors

Author graph generation method

We apply eigenanalysis on Japanese, English, and German authors to find  out which author do people find most inspiring in the literature in a sense of author network topology. First we need to generate an author graph that represents the relationships between authors. Of course we could generate such graph by hand, i.e., researching a lot of documents about authors. However, the number of famous Japanese authors maybe more than 1000. This is just our Sunday fun hobby project, we don't have enough time to do that.

Fortunately, nowadays we can use cloud knowledge. The natural solution seems to be using the information of Wikipedia. We can generate an adjacency matrix from the Wikipedia's link structure, then apply eigenanalysis to analyze the relationships between authors.

Assumption of this experiment

We assume the link structure of author pages in Wikipedia represents the relationships between authors.
This is a debatable assumption. We return to the first question ``What is the relationships between authors?'' in the Part 1 of this article. We define that the relationships of authors are given by the link structure of Wikipedia. Our intuition of this assumption is based on the idea: when a writer of Wikipedia made a link between authors, the writer thought there were some relationships between these authors. If this assumption cannot be accepted, the following experiment has no meaning. So we sometimes say, ``in a sense of Wikipedia link structure, ...'' in this article. So far, we believe this is a good method to find the relationships between authors and we don't have better idea to tackle this problem. When a better method is found, we can discuss this assumption again.

Based on this assumption, we will construct an adjacency matrix based on the link structure of Wikipedia and analyze it by eigenanalysis.

The advantage and disadvantage of this method are:


  1. Data size: We can use a relatively large digital data
  2. Correctness: Wikipedia pages are public and some review has been done
  3. Quality: We can expect there are some meaning in the link structure since these pages are made by human


  1. Error possibility: There could be errors in the link structure
  2. Wikipedia writer bias: Some Wikipedia writer may put some kind of bias depends on their preference
  3. Wikipedia edit guideline bias: Wikipedia's editing guideline may cause some kind of bias

The most attractive advantage for us is the large size data availability. If we try to construct an adjacency matrix of Japanese authors, we need to read a huge amount of literature and extract the relationships, or if we were fortunate, we would be able to find a book describing the author relationships, still we need to convert the data to digital processing possible form.

The disadvantage 1 can not be avoidable from any data source, though Wikipedia may have more errors than academically reviewed data source. The disadvantage 2 is a kind of nature of Wikipedia, we could not avoid this kind of bias. However, Wikipedia's other nature is not only one person is writing a page, thus, we hope this bias is not so severe. We need to explain the disadvantage 3. What is the edit guideline bias? The Wikipedia edit guideline recommends to add some specific links in the page, this may cause some bias. We will see such example in the result section. Although what is the definition of bias is a difficult problem. Even we thought there is some bias in the pages, others may not see there is a bias. We need to be careful adjusting bias. This adjusting may re-interpret the Wikipedia data. In a sense, this could be a filter of the observer. We may put some bias to change the Wikipedia's link structure under the name of removing bias. Although, this is our hobby project and as long as we write what kind of operation we did on the link structure data, we think it is fine. Whenever we altered some link structures, we will mention it.

Here, we can understand that all the disadvantages are a kind of link connection error. This error is difficult to detect if we can use only one source data. Of course what is the correct connection is beyond of this article. We have defined the connection is in the Wikipedia. Although, do we have only one data source?  No. We have several data sources. Wikipedia provides the same kind of data in other language Wikipedia. For example, Japanese authors are also listed in the English Wikipedia. Of course, Japanese author data is richer in Japanese Wikipedia than other language Wikipedia. Moreover, many Japanese author pages in English Wikipedia might be just translated from Japanese Wikipedia. This suggests that the data are not independent. If a English page is the exact translation of Japanese corresponding page, then the same link error would be there. In that case, we cannot detect the link error. However, as far as we see, these data are not totally independent data, but these data are not all the exact translation, i.e., we see some dependency. We should care the dependency of the data, but we think we can still use these data for a validation in some extent.

Authors in a Markov matrix: Which author do people find most inspiring? (24)

Google PageRank

PageRank [2] is a method to compute a ranking of web pages. Sergey Brin and Lawrence Page modeled the adjacency relationship of web pages via link and analyzed the importance of web pages using eigenanalysis. They proposed to use eigenvector as a criterion of the web search. Soon, they established a company called ``Google'' that used this method for their search engine. In their paper, the basic idea had already been proposed by Jon Kleinberg, who considered the web environment can be analyzed by eigenanalysis. Eigenanalysis itself has been studied from 18 century and it is fundamental in linear algebra. Then, the PageRank is not new and these two are just lucky? I don't think so. Maybe many people thought eigenanalysis to use web search at that time, but, these two really implemented this simple and solid idea. Also the size of web is huge and it is not trivial to compute the eigenvector of the web matrix. I think there is a huge difference between just think about and really implemented the method and proved it works.

In this article, my notation is based on Strang's book [9]. However, the PageRank paper [2]'s notation is different.
In PageRank paper [2], page 4, 2nd paragraph explains eigenvalue \(c\) is \(R=cAR\). I use \(\lambda\) of \(Mx = \lambda x\). Therefore, \(\lambda = \frac{1}{c}\).
In the paper, web graph is not ideal or quite incomplete, therefore, it is hard to compute the eigenanalysis. For instance, many link doesn't have the page, or there are loops of the links. They describe these problems and explain how to solve them. Their model is based on the following user: a user clicks links of a web page randomly, but the user also jumps to a totally irrelevant web page sometimes. In principle, PageRank method scans the web, generates the adjacency matrix, processes dangling links and link loops, adds random jumping term to the adjacency matrix, generates Markov matrix, compute the eigenvector. It is easy to say compute the eigenvector, but since the size of web is huge, they proposed the algorithm to efficiently compute it.

When I read this paper first time a few years ago, I was fascinated. It is elegant, simple, solid beauty linear algebra. Moreover, the theory is not only beautiful, but also practically useful. I recommend to read the paper and I think you will enjoy that. I think that impression kept somewhere in my mind, that is the reason when my friend asked me how to analyze the writer's relationship, I was immediately recall this paper. I heard nowadays search engine algorithm is more complex and sophisticated to adapt the spam and other factors, though, I believe this idea is still in the base.

Conclusion of Part1

In this article, we show the following:

  • We can formulate the authors relationship as a graph.
  • We can use mathematical tools, graph theory and linear algebra, to analyze the graph.
  • One of the powerful tool of linear algebra is called eigenanalysis. We can apply this tool to analyze the author relationship. This is the same method that Google does the web search. Google uses it to find the most influential page to show the best search result.

However, this is not the complete story. The Part 2 of this article will answer the first question, ``Which author inspire the people most?''

See you in the Part 2.


Authors in a Markov matrix: Which author do people find most inspiring? (23)

Again, at which station am I?

In the last section, I explained eigenanalysis using inhabitant change of two cities. Now I would like to extend this method involving more objects instead of two. Let's back to the Berlin S-Bahn station graph of Figure 7.
Graph example 2. Each node is a train station.

Same as moving around cities, people can be moving around the stations. Let's assume a person choose the next station equal possibility. By this assumption, how the people moving around is defined by the connection of the stations. Namely, the people's staying possibility at each station depends on the train station topology. We can generate the matrix that represents how the people moving around from the station adjacency matrix. If a station connected two stations, the possibility of each connection is used is \(\frac{1}{2}\) each since we assume a person choose the next station equal possible. If a station connected two stations, the possibility of each connection is used is \(\frac{1}{3}\). This is because we assumed so. We can of course use other model, but I would like to stick this assumption here. This is achieved normalizing the column vector by \(L_1\) of the adjacency matrix. (It's a bit detail, however, we use \(L_1\) norm here since this is probability.)
   1 & 1 & 0 & 0 \\
   1 & 1 & 1 & 1 \\
   0 & 1 & 1 & 0 \\
   0 & 1 & 0 & 1 \\
   0.5 & 0.25 & 0   & 0 \\
   0.5 & 0.25 & 0.5 & 0.5 \\
   0   & 0.25 & 0.5 & 0 \\
   0   & 0.25 & 0   & 0.5 \\
This is the Markov Matrix of S-Bahn generated by the adjacency matrix.

Let's compute the each station's staying possibility by octave.
octave:10> Mb =
[0.5 0.25 0 0; 0.5 0.25 0.5 0.5;
 0 0.25 0.5 0; 0 0.25 0 0.5];
octave:11> [L D] = eig(Mb)
L =
2.88e-01  8.16e-01 -3.77e-1  1.25e-01
-8.66e-1 -4.83e-16 -7.55e-1 -3.25e-16
2.88e-01 -4.08e-01 -3.77e-1 -7.61e-01
2.88e-01 -4.08e-01 -3.77e-1  6.35e-01
D =
Diagonal Matrix
  -0.2500       0       0       0
        0  0.5000       0       0
        0       0  1.0000       0
        0       0       0  0.5000
octave:12> x1 = L(:,3)/ sum(L(:,3))
x1 =
Here I use third column vector as the eigenvector, since the third eigenvalue is one. Remember, the eigenvalue one is the important here. For a nowadays computer, 1000 times matrix multiplication is also possible for this small size of matrix. So we can compute it also.
octave:16> Mb^1000 * w
After 1000 steps, people's staying probability of each station is 0.2, 0.4, 0.2, 0.2. Even if we start any distribution of the people, the distribution of the people becomes this numbers after enough long steps. However, we have already know this by eigenanalysis. If you recall the topology of the stations (Figure 7), the second station, Alexanderplatz, has twice people than other stations. In this example, Alexanderplatz is a hub station. We can compute how many people stay in Alexanderplatz compare to the other stations.

Finally, we saw all the theoretical background of author network analysis using station connections. Then, what is the relationships between station connections and authors' importance. We have seen both are represented by a graph. If we had a graph representation of any relationships, they became the same mathematical entity: to find which station is the important station as the visiting possibility and to find which author is the important author as the visiting possibility on the web. Human interpretations are different between those, but the mathematical representation are the same. It is similar to \(2+3=5\) can be interpreted as any quantity: milk, time, or Euro. We can add 2 litters milk and 3 litters milk, which is totally 5 litters, we can add 2 hours and 3 hours, which is totally 5 hours, and we can add 2 Euros and 3 Euros, which is totally 5 Euros. It's just a human interpretation to see a graph as a station network or a author network.

I know some people don't like this ``abstraction'' that removes some concrete meaning since it is a kind of inhuman. I understand this. There is a similar operation that converts all values to money. I personally don't like to convert any value to one global uniform value ``money'' removing all the other subtle values. Once you convert to this abstract value, you can convert the value to add or to subtract, your life value, your children's value are converted to some amount of money and you can compute how much milk is equivalent to them. This is an extreme example that doesn't work. At least I don't want to exchange my life to some amount of milk. But if we properly use this abstraction, we can apply many ideas to solve many ideas. Even we can use some ideas to solve a completely new problem that no one has never encountered. This makes mathematics an important tool. I believe that it is important to have a balance sense to how much abstraction we can use to solve a problem. If mathematicians ignored some substance of the problem, their abstraction is no longer representing the problem. It is the same problem that we assume the value of person's life can be replace with the value of milk through the money. Though I like to drink milk every day. I think how to solve a problem using mathematics is human activity. The humanity of (applied) mathematician became important. I believe that mathematics and humanity are related. Personally, it is quite interesting to me that the quality of mathematics seems correlated with on the quality of the person who does the mathematics.

Authors in a Markov matrix: Which author do people find most inspiring? (22)

How to compute the eigenvectrors?

Here I don't explain the details of algorithm to compute the eigenvectors and eigenvalues. Since there are some free software available to compute them. I use octave's function eig() in this article.

Let's compute the eigenvectors and eigenvalues of the matrix \(M\).
octave:34> [L V] = eig(M)
L =
   0.83205  -0.70711
   0.55470   0.70711
V =
Diagonal Matrix
   1.00000         0
         0   0.50000
By the way, I told you eigenvalue seems equivalent to the matrix. Yes, this is correct, however, one number can not exactly represent a matrix. If it is possible, why we need a matrix at all? A usual matrix has multiple eigenvalues and eigenvectors. But the number of elements of a matrix is \(n^2\) and its eigenvalues is \(n\). So, eigenvalues are still simpler.

The eigenvalues of matrix \(M\) are 1 and 0.5, corresponding eigenvectors are (0.83205, 0.55470) and (-0.70711, 0.70711). Here we ignore the eigenvalue 0.5 since eigenvector of eigenvalue 0.5 doesn't tell the convergence value. The eigenvector of eigenvalue 1 tells us how the Markov matrix converges. If you are curious, please look up the ``matrix diagonalization''. We can compute when the total number of inhabitants is 1000 case.
octave:38> x1 = L(:,1)/ sum(L(:,1))
octave:39> x1 * 1000
This is exactly the same result we have already seen. The difference between this and former method is this doesn't need to apply \(M\) many times. We even don't know how many times we need to apply the matrix, but this eigenanalysis directly gives us the answer. Also, this example employed a small size of matrix and that didn't make any difference especially we used a computer instead of computing by hand. However, these two methods are totally different efficiency if you have a large size matrix.

In the next section, we will see how to generate the Markov matrix from the adjacency matrix. We could compute the station staying probability, which station will we most probably end up with, by this Markov matrix as in the station example. It sounds a bit odd, but, this is exactly the same to authors' network analysis. We will see how in the next artices. The part 1 of this article (theory) came close to the finale.


Authors in a Markov matrix: Which author do people find most inspiring? (21)

Here we don't want to consider what is the result of applying a matrix to a vector, but we would like to consider what the property of a matrix. How all the vectors behave by a matrix, not only one specific vector. Because the combination of inhabitants of Berlin and Potsdam. Even the total inhabitants is fixed, i.e., 1,000,000 then, we can make 1,000,001 different vectors. We can also change the total inhabitants to any number. However, you have already seen, this matrix can be described two numbers (= one vector). This is the reason eigenanalysis is an important idea in linear algebra. Because you can only think just two numbers, instead of millions in this example.

When Berlin and Potsdam's inhabitants started by any number, enough years later, the inhabitants becomes specific numbers.

The vector of inhabitants of Berlin 600, Potsdam 400 is a special vector of this matrix and therefore, it has a name. It is called an eigenvector. Figure 12 shows this vector on Figure 13. You see all the initial state converges on the eigenvector in Figure 13.
Figure 12: Population history of Berlin and Potsdam.
Figure 13: Population history of Berlin and Potsdam with $y =\frac{400}{600}x$ line.
Eigenvector \(\vec{x}\) is a special vector regarding matrix \(M\), such that
 M\vec{x} &=& \lambda \vec{x}.
Where the \(\lambda\) is a scalar value. This is neither a matrix, not a vector, just a number. This \(\lambda\) also has a name, eigenvalue.

In this example,  \(\lambda = 1\) as you see in the following.
\begin{eqnarray*} M \left[ \begin{array}{c} 600 \\ 400 \\ \end{array} \right] &=& \left[ \begin{array}{c} 600 \\ 400 \\ \end{array} \right] \end{eqnarray*}
This vector can be multiplied by any scalar number.
\begin{eqnarray*} M \left[ \begin{array}{c} 6 \\ 4 \\ \end{array} \right] &=& \left[ \begin{array}{c} 6 \\ 4 \\ \end{array} \right] \end{eqnarray*}
The eigenvalue is also an incredible number to me. If you see the above equation, it looks like matrix \(M\) is equal to one number. In this example, the matrix size is 2x2, but, even 1000x1000 matrix, the eigenvalue is a scalar. That means the essence of this complex matrix is squeezed to a scalar number. I have a problem to understand 1000 dimensional matrix, but, I have some feeling if it is just one number.

Here we have forgot a vector and get some property of the matrix. Let's see what happens if we apply the matrix to the matrix instead of a vector.

octave:2> M
   0.80000   0.30000
   0.20000   0.70000
octave:3> M^2
   0.70000   0.45000
   0.30000   0.55000
octave:4> M^3
   0.65000   0.52500
   0.35000   0.47500
octave:5> M^5
   0.61250   0.58125
   0.38750   0.41875
octave:6> M^10
   0.60039   0.59941
   0.39961   0.40059
octave:7> M^100
   0.60000   0.60000
   0.40000   0.40000

The numbers in the last result is familiar.

In this article, I don't explain the algorithm to compute the eigenvectors and eigenvalues. Curious reader can refer [9]. We apply the matrix \(M\) multiple times and see it converges. But this only happens when the largest eigenvalue of this matrix is 1. Not all matrices has the largest eigenvalue 1. However, a Markov matrix has the largest eigenvalue 1. The curious readers can found why a Markov matrix has the largest eigenvalue 1 in Appendix B.

You can often find the story of population change and eigenvector, since this is a kind of standard example. Though I found the book [6] is interesting. I also like Shiga's book [8] about eigen problem. My favorite linear algebra book is [9].


[6] Yoshio Kimura, ``Fun of linear algebra for freshman (Daigaku ichinensei no tameno omosiro senkeidaisuu,'' Gendai Suugakusha, 1993

[8] Kouji Shiga, ``30 Lectures of eigen problem (Koyuuchi mondai 30 kou,'' Asakura shoten, 1991

[9] Gilbert Strang, ``Introduction to Linear Algebra, 4th Edition,'' Wellesley-Cambridge Press,

Authors in a Markov matrix: Which author do people find most inspiring? (20)

I sometimes find a mathematical insight in great literature or music. Bach's music has a lot of mathematical pattern, Haiku or some kind of lyrics also have a formal pattern. Some of Souseki Natume also have mathematical insight. These people sublimate these patterns to arts. I feel sometimes happy that I can see these aspect. Let me show you one example.

Do you know ``Meijinden (A master's story)'' by Atusi Nakajima? I like his ``Deshi (a student)'' and ``Riryou'', but my best story from him is Meijinden. A master of archery want to be the master of masters. One point he finally met the master of masters. He shot 20 birds down with one arrow. The master said to him, ``I see you can shoot an arrow. But that is just a shoot of shooting. You seem not to know a shoot of non-shooting.'' and the master shot 20 birds down without any arrows. The shooting world is limited if one uses an arrow, what is the substance of shooting? If we want to over the one world and look into the substance of it, sometimes you need to abandon an important but not substance component.

A few times I tried to explain this story to some people. This is a Japanese classic story based on Chinese old story. However, sometimes people think I am joking, even though I seriously think this is how new idea came. In Japanese, mathematics is ``数学 Suu-gaku(= study of numbers)''.(Some know a puzzle called ``数独 Suu-doku'', number alone: short from of Mathematics is good for a single person. Suu means number.) When mathematics reached at some level, it became not a study of numbers. The master might say: ``I see you know some of numbers. But that is just a number of numbering. You seem not to know a number of non-numbering.''  Mathematics started to count numbers, then found the operations between numbers (addition, subtraction, ...). But one point, mathematics stopped discussing numbers. Mathematics started discussing operators without numbers. That was one big point of mathematics. What is the operator can create. When the subtraction is invented, people found some of the subtraction can not be done. For instance, 3 - 5 can not have an answer until a minus number is invented. Some people doubted, can we add any numbers? If the number is very large, can we still add any two numbers? People started ``any'' numbers. This is a great one step. People knew the division has a similar problem, 3/5 can not have the answer until fraction is invented. Minus number was not enough. One started thought, when this will be end? Can we think about all the operators? Not only one operator, but all the operations we can think about. Like we can think not only one number, we can think about any numbers. Galois is one of the pioneer of this area, therefore, his work is important. Same of the computer programming, first, a function was put a number and get a new number. Then next level is: forget the number, put a function, get a new function. This idea of meta programming pushes the concept of programing to the next step.  You might saw this year's Google's doodle about Turing [1]. He thought if there is a universal computer, which means a machine can simulate any computer, what is the limit of this machine. Incredibly, he proved what computer can do and what can not before nowadays electronic computer was invented. I can not stop these stories how the abstraction is powerful. I should back to the matrix story now.

Let's back to the inhabitants of Berlin and Potsdam next time.


Authors in a Markov matrix: Which author do people find most inspiring? (19)

In the last article, we saw the number of inhabitants became one specific value. I raised a question, does this result depend on the initial state? In other words, are these conversing numbers the property of the matrix only or the property of both the matrix and the initial population vector?

  • Hypothesis 3: If the initial state is different, the result may change. For example, we started Berlin 900, Potsdam 100. If we started Berlin 0, Potsdam 1000, the result maybe not the same.

We can compute this by octave again. Set the initial state Berlin 0, Potsdam 1000.

octave:10> p = [0 1000]';
octave:11> M * p
octave:12> M^2 * p
octave:13> M^3 * p
octave:14> M^10 * p
octave:15> M^100 * p

Surprisingly, the results are the same even if we changed the initial state.  Figure 12 shows many different initial states and how they are changed after many years. I believe you see a pattern. And finding a pattern is mathematics.
Figure 11: Population history with various initial conditions.
By the way, do you realize the inhabitants Berlin 600 and Potsdam 400 are the special numbers? Let's compute how many people are moving in this number.
\begin{eqnarray*} \mbox{Berlin} \rightarrow \mbox{Potsdam} &=& 600 * 0.2 \\ &=& 120 \\ \mbox{Potsdam} \rightarrow \mbox{Berlin} &=& 400 * 0.3\\ &=& 120 \end{eqnarray*}
When once the number of inhabitants becomes this number, the number of moving people from Berlin to Potsdam and from Potsdam to Berlin become the same. Therefore, once the inhabitants of the cities became this number, the number of inhabitants will not change anymore. As you see in Figure 11, there is also something that doesn't depend on the initial number of total population. This suggests there is some kind of property in the matrix, not in the population vector. Mathematics first thinks the mathematical object, namely, numbers. Initially, Mathematics's main concern was how we could operate the numbers, i.e., addition, subtraction, and so on. But one point, the concern of mathematics was moved to the operation. The mathematicians stopped thinking about numbers. Here, how many inhabitants was the first concern, however, the next level becomes what the matrix is.

I would like to talk about that mathematicians stopped thinking about numbers in the next article. It would be related with mathematics, but not so much.

Authors in a Markov matrix: Which author do people find most inspiring? (18)

Let's back to the question, how does the number of inhabitants looks like many years later. Before we perform the calculation, let's make some hypotheses and check them out. Why do we make some hypotheses? Since it is the fun of mathematics. If I predict something based on mathematics, then I can check it out by calculation. If my prediction is correct, that's the fun. It is like my plan succeeded in my chess game.

One thing is clear, the total number is 1000 since no one is born or die and these people always stay in one of the cities.
  • Hypothesis 1: One day, all the inhabitants move to Berlin since the ratio of staying Berlin (0.8) is larger than that of the Potsdam (0.7).
This hypothesis seems not correct. The amount of outgoing people is 20% of the inhabitants of Berlin, when Berlin inhabitants increased, the total amount of people is larger than the incoming amount. For example, When Berlin's inhabitants is 900, 20% of it is 180, on the other hand, Potsdam's inhabitants is 100, 30% of it is 30 which is smaller than 180. In fact, Potsdam's inhabitants increased at the first year and second year.

  • Hypothesis 2: the inhabitant of Potsdam increased \(100 \rightarrow 250 \rightarrow 325\) in these two years. However, one point, the number of inhabitant of Potsdam becomes enough large, then inhabitant of Potsdam will decrease. Some years later of that, the inhabitants of Berlin becomes larger than Potsdam. Repeating that, the number of inhabitants of the cities may oscillates.

Let's try some computation with octave to check this hypothesis 2. (Nowadays I may rely on computer too much. In old time, people first ``think'' and prove the following...)

octave:5> M^3 * p
octave:6> M^4 * p
octave:7> M^10 * p
octave:8> M^100 * p

It seems there is no oscillation, but the number of inhabitants converges to a specific value. 100 years later, Berlin has 600 people, and Potsdam has 400 people. Figure 11 shows this change.
Figure 11: Population history of Berlin and Potsdam.
Initially, Berlin has 1000 people, Potsdam has 0 people. The population goes to Berlin 600, Potsdam 400 and then there is no change anymore.

You might notice the intermediate results has fraction. We computed number of people and you might think what is 600.29 people means. Because we first assume the ratio of moving people 0.2. This doesn't happen in reality. In reality, some people move and then we can think how much percentage people was moved. However, this is not completely nonsense. For example, moving out ratio of a city or birth ratio of a city don't change so drastically in a year. It is a good guess next year is the same ratio. The calculus has more sophisticated guessing method, but, we will stick this guess. Current guess is called first order approximation. Making a plan with such guess is not so bad compare to assuming completely different number. We could interpret the fraction as some people spent a few months in Berlin and rest of the months in Potsdam.

By the way, is this result depends on the initial state? In other words, are these conversing numbers the property of the matrix only, or the property of both the matrix and the initial population vector? Let's see this in the next article.


Authors in a Markov matrix: Which author do people find most inspiring? (17)

Last time, we introduced Markov matrix as an extension of adjacency matrix. Let's assume \(M\) has the following form.
\begin{eqnarray} M &=& \left[ \begin{array}{cc} 0.8 & 0.3 \\ 0.2 & 0.7 \\ \end{array} \right] \label{eq:berlin_potsdam_mat} \end{eqnarray}
The meaning of each element of this specific example is:
\begin{eqnarray*} M &=& \left[ \begin{array}{cc} \mbox{Stay Berlin} & \mbox{P $\rightarrow$ B} \\ \mbox{B $\rightarrow$ P} & \mbox{Stay Potsdam} \\ \end{array} \right] \end{eqnarray*}
Where \(\mbox{B $\rightarrow$ P}\) is a ratio of moving people from Berlin to Potsdam and \(\mbox{P $\rightarrow$ B}\) is a ratio of from Potsdam to Berlin. The ratio is against to the current inhabitant.  In Equation of \(M\), 80% of inhabitants Berlin will stay in Berlin after one year. 20% people will move from Berlin to Potsdam. As you see, the total sum of column is 1 (0.8 + 0.2 = 1.0) since we consider all the inhabitants. This is same to inhabitants of Potsdam. 70% of inhabitants of Potsdam will stay Potsdam and 30% will move to Berlin after one year. The total sum of this column is also 1 (0.7 + 0.3 = 1).

Since the elements of matrix represent ``Ratio of stay/move'', the elements are always more than or equal to 0, and less than or equal to 1. There is no minus number of people movement. Also, there is no more than 100% inhabitants, therefore the element of matrix never more than 1. This matrix completely determines the next year's state from the last year's state. This kind of matrix is called Markov matrix.

Assume Berlin has 900 inhabitants and Potsdam has 100 inhabitants. Then, total 1000 people in these cities. What is the next year's inhabitants distribution? We can compute this using octave.

octave:29> M = [0.8 0.3; 0.2 0.7];
octave:30> p = [900 100]';
octave:31> M * p

Two yeas later will be:

octave:32> M^2 * p

I have several quiz to understand this matrix more. Let's think about the number of inhabitants many years later.

By the way, some of the readers might have a question: Why you want to know the many years later. Because I am interested in what happens many years later. But this actually didn't answer the question. ``What happens after many years?'' is the same question to ``What happens in the future?''

Many sciences gather knowledge. The purpose of gathering knowledge is to know the future. Mathematics is essential since it helps to know the future in some extent. I believe that many people are interested in the future since knowing about the future helps to survive. If the society didn't care the future, there is no future of such society. For example, the people who don't care the food in winter, they consume the food before the winter, then disappeared. The people who don't care the children's education, their society has less chance to develop. We are the result of long survival, since we predict the future, like we need some food in the winter, and prepare the winter. If we didn't learn that, we don't exist. As the result, I think many people are interested in the future.

Next time, let's predict the future based on our hypotheses.


Authors in a Markov matrix: Which author do people find most inspiring? (16)

Markov matrix

The train station example shows that applying an adjacency matrix to a stations vector can tell us which station can we reach from any starting station. Let's think one step further, we can look into the quantity side of the movement in stead of possible side of the movement.

There is a city called Potsdam 40 minutes away from Berlin by a train (S-Bahn). These two cities are close and a lot f movement of people between them. Some move Potsdam to Berlin and others move other way around. The adjacency matrix of the two cities is the following.
\begin{eqnarray*} \begin{array}{|c|c|} \hline \mbox{Station} & \mbox{1 step} \\ \hline \mbox{Weinmeisterstr} & 1 \\ \mbox{Alexanderplatz} & 0 \\ \mbox{Hackescher Markt} & 0 \\ \mbox{Jannowitzbruecke} & 0 \\ \hline \end{array} \end{eqnarray*}
Let me show you what is the meaning of columns and rows of the matrix.
\begin{eqnarray*} \begin{array}{ccc} & \mbox{Berlin} & \mbox{Potsdam} \\ \begin{array}{c} \\ \mbox{Berlin} \\ \mbox{Potsdam} \\ \end{array} & \left[ \begin{array}{c} 1 \\ 1 \\ \end{array} \right. & \left. \begin{array}{c} 1\\ 1\\ \end{array} \right] \end{array} \end{eqnarray*}

This matrix only represents the connectivity. This means we can stay both cities and we can also move one to another. Though, this is not so interesting. An interest is a feeling, this can not be explained by mathematics. Therefore, some might think this is a strange as a mathematics discussion. Mathematician has some passion about a problem: interesting or not. In this case, there are two cities connected, and what the adjacency matrix can tell is that you can be any city. It's trivial, isn't it? At least I am not interested in this case.  But it became more interesting when we introduce how many people is moving in the matrix.

Assume a following vector that represents population of the cities. Say this vector a population vector.
\begin{eqnarray*} \left[ \begin{array}{c} p_{b} \\ p_{p} \\ \end{array} \right] &=& \left[ \begin{array}{l} \mbox{Population of Berlin} \\ \mbox{Population of Potsdam} \\ \end{array} \right] \end{eqnarray*}
Let's think about the change of population. Assume the year 2000 is the starting point, time \(t=0\). We write this as:
\begin{eqnarray*} \left[ \begin{array}{c} p_{b} \\ p_{p} \\ \end{array} \right]_{t=0} \end{eqnarray*}
The population change is observed every year. Assume how this year's change happens based on the last year and let it a matrix \(M\).
\begin{eqnarray*} \left[ \begin{array}{c} p_{b} \\ p_{p} \\ \end{array} \right]_{t=k+1} &=& M \left[ \begin{array}{c} p_{b} \\ p_{p} \\ \end{array} \right]_{t=k} \end{eqnarray*}

To make the example simple, the change of population only defined by people's moving ratio. This means no baby is born, no one die. Also, the people only move between these two cities and the ratio of moving people are fixed.

This assumption is a big limitation. It's easy to imagine that the people's moving ratio may change every year, babies will be born, and some will die. Some people surely move to other cities. None of the assumptions is realistic. We can make the model more complex to be more realistic, i.e., number of cities can be arranged by other matrix size, birth and death ratio also can be considered. My motivation here is not accurately predicting the population. My motivation here is to show how the matrix analysis works. Therefore, I would like to stick the simplest model.  This is a common problem when we try to explain a mathematician idea. A simpler example is easy to understand, but it is usually less realistic. I can imagine some people think the class mathematics is not useful because the assumption is usually not realistic and can not be applied to the real problems. But this is a learning problem. If the example is too complex, we might not be able to understand it, if the example is too simple, we might not be able to apply it to the real problem. I show a simple but unrealistic example for understanding. Then, I would like to apply the method to more complex problem later.

Today's story became a bit longer, yet just the introduction. Next article, I would like to talk about this population matrix.


Authors in a Markov matrix: Which author do people find most inspiring? (15)


At which station am I?

An adjacency matrix represents graph topology (how the nodes are connected). However, a matrix can not only represent the connections, but also can be applied to a vector and can generate a new vector. We saw the adjacency matrix can generate a new station vector. We can continue this computation a bit more. Why do we this? I want to show you a bit interesting stuffs. Let's assume we are first at Weinmeisterstr. We repeat visiting to next station or staying the station. Each computation result shows how many possible paths are there to reach each station. Let's see how this number goes.

This is initial state, we are at Weinmeisterstr station.
\begin{eqnarray*} \begin{array}{|c|c|} \hline \mbox{Station} & \mbox{1 step} \\ \hline \mbox{Weinmeisterstr} & 1 \\ \mbox{Alexanderplatz} & 0 \\ \mbox{Hackescher Markt} & 0 \\ \mbox{Jannowitzbruecke} & 0 \\ \hline \end{array} \end{eqnarray*} Second step result is:

\begin{eqnarray*} \begin{array}{|c|c|} \hline \mbox{Station} & \mbox{2 steps} \\ \hline \mbox{Weinmeisterstr} & 2 \\ \mbox{Alexanderplatz} & 2 \\ \mbox{Hackescher Markt} & 1 \\ \mbox{Jannowitzbruecke} & 1 \\ \hline \end{array} \end{eqnarray*}

Let's continue to 3 steps, 5 steps, and 10 steps.
\begin{eqnarray*} \begin{array}{|c|c|c|c|} \hline \mbox{Station} & \mbox{3 steps} & \mbox{4 steps} & \mbox{10 steps} \\ \hline \mbox{Wein.} & 4 & 26 & 3862 \\ \mbox{Alex.} & 6 & 44 & 6688 \\ \mbox{Hack.} & 3 & 25 & 3861 \\ \mbox{Jann.} & 3 & 25 & 3861 \\ \hline \end{array} \end{eqnarray*}
I guess the number of paths to be Alexanderplatz seems twice larger than to be other stations. I think this kind of guess is important in mathematics. I enjoy mathematics by finding a pattern.

Actually, it is not a coincidence that the possible number of paths to go to Alexanderplatz is twice to the other stations when number steps is large. It is interesting to me that such a pattern is here, this would have been more chaotic. The following sections, I would like to look into more details. However, I hope you have some kind of feeling that we can analyze the relationship using graph theory and adjacency matrix.

I would like to come back to the first question. That is ``How can we analyze the relationship of authors''. The mathematics we see here consider only relationship between objects. We see the examples of numbers, authors, and train stations. But graph theory and adjacency matrix don't care what are the objects. Figures 4, 5, 6 are all the same graph and shares the same adjacency matrix. You might think this is careless or inhuman. But on the other hand, don't care means anything can fit this. We can use these tools (graph theory and adjacency matrix) not only for train stations, but also for authors. This makes these tools powerful and versatile. We see we can compute how to visit the train stations like from station A to station B. We can also compute how to reach the person like from person A to person B. We can find a pattern in relationships.
Figure 4:Edges connect nodes.
Figure 5. The same graphs. Graph only cares the connections between nodes.
Figure 6: Graph example 1. Each node is an English author.
In the next article, we will see a method of finding ``a feature of adjacency matrix''.


Authors in a Markov matrix: Which author do people find most inspiring? (14)

Let's continue the story of matrix calculation from the last time. It's a bit cumbersome to write the same adjacency matrix every time, let's call this station connection matrix \(M_{bahn}\) since Bahn means train in German.
\begin{eqnarray*} M_{bahn} &= & \left[ \begin{array}{cccc} 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ \end{array} \right] \end{eqnarray*}
We can compute neighbor stations by applying \(M_{bahn}\) to the station vector. If we apply the \(M_{bahn}\) twice to the the station vector, you can get the reachable station by two hops. The resulting number means how many possible paths to reach the station. Let's apply the \(M_{bahn}\) twice to the station vector that means one is at the Weinmeisterstr.
\begin{eqnarray*} (M_{bahn})^2 \left[ \begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \\ \end{array} \right] &=& \left[ \begin{array}{c} 2 \\ 2 \\ 1 \\ 1 \\ \end{array} \right] \end{eqnarray*}
The resulting element of the vector is how many paths to get the each station by 2 hops. Namely:
\begin{eqnarray*} \left[ \begin{array}{c} 2 \\ 2 \\ 1 \\ 1 \\ \end{array} \right] \begin{array}{cccl} \mbox{Wein.} & \rightarrow & \mbox{Wein.} & \mbox{2 paths} \\ \mbox{Wein.} & \rightarrow & \mbox{Alex.} & \mbox{2 paths} \\ \mbox{Wein.} & \rightarrow & \mbox{Hack.} & \mbox{1 path} \\ \mbox{Wein.} & \rightarrow & \mbox{Jann.} & \mbox{1 path.} \\ \end{array} \end{eqnarray*} For instance, the first entry is 2, this means there are two paths from Weinmeisterstr to Weinmeisterstr by two hops. The first path is
Wein. \(\rightarrow\) Alex. \(\rightarrow\) Wein.
that first hop is go to Alexanderplatz, then second hop is go back to the Weinmeisterstr. The second path is
Wein. \(\rightarrow\) Wein. \(\rightarrow\) Wein.
that first hop is stay at Weinmeisterstr and second hop is also stay at Weinmeisterstr. Therefore, two different paths to go to Weinmeisterstr with two hops.

I recommend to use octave [6] or matlab to compute this kind of computation. I personally like matlab, but, the license is too expensive for me to do my personal hobby. I will definitely buy if they provide 5 years older version including toolboxes with 100 Euro. I am looking for the matlab Home edition like Mathematica Home edition. However, it seems it is 100 times expensive. So, I usually use octave. This free software is also a great software. The followings show the computation result by octave. (I edited the output result to fit into the column.)

octave:6> Mb = [1 1 0 0; 1 1 1 1;
                0 1 1 0; 0 1 0 1];
octave:7> w = [1 0 0 0]';
octave:8> Mb * w
ans =
octave:9> Mb * Mb * w
ans =

How is the three steps.

octave:10> Mb^3 * w
ans =

The first element is 4, this means there are four paths from Weinmeisterstr to Weinmeisterstr with three hops as the following.
\begin{eqnarray*} \begin{array}{ccccccc} \mbox{W} & \rightarrow & \mbox{W} & \rightarrow & \mbox{W} & \rightarrow & \mbox{W} \\ \mbox{W} & \rightarrow & \mbox{A} & \rightarrow & \mbox{W} & \rightarrow & \mbox{W} \\ \mbox{W} & \rightarrow & \mbox{W} & \rightarrow & \mbox{A} & \rightarrow & \mbox{W} \\ \mbox{W} & \rightarrow & \mbox{A} & \rightarrow & \mbox{A} & \rightarrow & \mbox{W} \\ \end{array} \end{eqnarray*}
Where W is Weinmeisterstr, A is Alexanderplatz. There are three allows, this means three movements (or stay).

Here you saw how you used an adjacency matrix to know where you were after n steps movement.

By the way, why does an adjacency matrix tells us the number of paths? (The following explanation assumes you know about the relationship between the column space and the solution space. If you are not familiar with them, you can skip this paragraph.) First of all, we need to transpose the matrix if the graph is a directed graph. Then each column represents connection of the neighbor nodes. The matrix has now number of column dimension column space which is the neighborhood connection space. The solution space is a linear combination of the column space. Thus this is a connection combination. The column space has all the possibility of the neighbor connection. How we can get a linear combination of the column space? The answer is applying the matrix to a vector. This is the brief reason why an adjacency matrix can tell us the reachable nodes. If you know the solution, like you know how many paths you need to visit the stations, you can also know the last step by solving the system. Though this is not so interesting question for me of the station connection graph. Since my usual question is where can I reach to in some steps, not which station I came from.

Now we know how to write a graph in a matrix form and how to use the adjacency matrix. Next article, we are going to look into the property of the matrix by the eigenanalysis.


[6]  GNU Octave, http://www.gnu.org/software/octave/

Authors in a Markov matrix: Which author do people find most inspiring? (13)

Last time I forgot to tell you about the diagonal element of the matrix. In the last article, the diagonal elements of the adjacency matrix are zero. That means you can not stay the station. However, I think each station is connected to itself.  It seems not so reasonable if someone want to stay the station, one always need to move one station and come back. When you want to force such move, then, this adjacency matrix is useful. However, if I want to go to Alexanderplatz from Alexanderplatz, which means I want to stay at Alexanderplatz, I just want to stay the station. If we allow to stay at a station, the adjacency matrix should be the following.

\begin{eqnarray*} \begin{array}{ccccc} & \mbox{Wein.} & \mbox{Alex.} & \mbox{Hack.} & \mbox{Jann.} \\ \begin{array}{c} \\ \mbox{Wein.}\\ \mbox{Alex.}\\ \mbox{Hack.}\\ \mbox{Jann.}\\ \end{array} & \left[ \begin{array}{c} 1 \\ 1 \\ 0 \\ 0 \\ \end{array} \right. & \begin{array}{c} 1\\ 1\\ 1\\ 1\\ \end{array} & \begin{array}{c} 0\\ 1\\ 1\\ 0\\ \end{array} & \left. \begin{array}{c} 0\\ 1\\ 0\\ 1\\ \end{array} \right] \end{array} \end{eqnarray*} The diagonal elements are now 1.

An adjacency matrix doesn't only show the connections, but also we can compute which station we can reach. This is a nice property of the adjacency matrix. For example, if I am at the Weinmeisterstr and then let's compute where can I reach in one step. We can represent the current location as a following vector.

\begin{eqnarray*} \left[ \begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \\ \end{array} \right] \end{eqnarray*} This vector means someone is at Weinmeisterstr strasse, and no one is other stations.

I just call this vector as station vector, since this vector represents how many possibility to visit to a specific station. Now we apply the adjacency matrix to this station vector.

\begin{eqnarray*} \left[ \begin{array}{cccc} 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \\ \end{array} \right] &=& \left[ \begin{array}{c} 1 \\ 1 \\ 0 \\ 0 \\ \end{array} \right] \end{eqnarray*}

 The first element of the vector represents possible visiting number of method to Weinmeisterstr, and the second element of the vector represents possible visiting number of method to Alexanderplatz. These are 1, that means one step after started with Weinmeisterstr, we can be at Weinmeisterstr or Alexanderplatz. In other words, If we move one step from Weinmeisterstr, we can reach Weinmeisterstr or Alexanderplatz. When we want to visit Weinmeisterstr from Weinmeisterstr in one step, we can just stay the station.

You now see the power of matrix. At the first point, we only know the connections between stations. But we can now compute the where we can go. We will see more about the adjacency matrix.


A pattern girl

D is a bit noisy. She sometimes made a story and talked out laud in the class. However, I respect a person who can make a new story. So, I don't want to discourage her to do that. On the other hand, she disturbs other children. So, I try to ask her to concentrate the problem.

Last time, D practiced plus calculation with Zahlenhaus. In the beginning, Zahlenhaus has plus and minus examples. Figure 1 shows a Zahlenhaus example.
Zahlenhaus: 7 = 4 + 3
A practice example is the following:

 4 + 3 &=&  \\
 3 + 4 &=&  \\
 7 - 3 &=&  \\
 7 - 4 &=&  \\

Another typical practice example is the following:

 1 + 6 &=&  \\
 6 + 1 &=&  \\
 7 - 1 &=&  \\
 7 - 6 &=&  \\

This practice book tries to explain that the plus calculation is commutative, i.e., we can add numbers in different order, i.e., \(1 + 6 = 6 + 1\). For the minus calculation, we can remove one of the room from the Zahlenhaus, then the other room remains. This book develops the question as the following:

  1 & + & 6 & = \\
  6 & + & 1 & = \\
  &   - &   & = \\
  &   - &   & = \\

D never made a mistake.

However, I am interested in how she understand this. I less care the answer is correct or not. Therefore, I asked her, ``Wie hast du dies gerechnet? (How did you calculated this?)'' This is the most interesting time for me. Most of the children can not explain well. But she explained it me very clear.

The question was:

  1 & + & 6 & = \\
  6 & + & 1 & = \\
  7 & - &   & = \\
  7 & - &   & = \\

She told me the first answer is in the third row as shown in the red.
Therefore, she just copied it.
The next answer is at the 4th row (blue). Let's copy it again.

Others are the same. The same color position has the same value.
I said, ``Ausgezeichnet! (Excellent!)''

One of the most important thing in mathematics is finding a pattern. When the human being found a patten about things, ``We can count things.''  Then mathematics was born. Finding the patterns in the world, and formulate them, that is mathematics. But, we need to check the patterns that when the pattern hold and when it does not. Long time, I didn't understand the mathematics proof checked such a pathetic cases. Because we want to know when we can use it. If we know the limit, we also know when we can use it.

This practice book has the same pattern for next pages. And the book continues Zahlenhaus 8, 9, 10. She will never make a mistake without understandings. (I sometimes see such kind of research. Nobody knows how it works, but, it seems working. I don't want to call it a research because no understandings. But, maybe they are still fine since they add some experimental knowledge to the society.)

I told her, ``Everything is correct. It is good to find a pattern.'' But I thought a while. Her method is great, but it will fail at some point. How can I persuade her there is a case it fails?

``Mathematics is a kind of language. So there are meanings all of these lines. You can express something with mathematics.... Here, there are plus and minus. What is the `plus' meaning?''

``Plus means more.''

``Yes, ... but, why 1 + 6 is equal to 7?''


``1 + 6 is not equal to 10. But, 10 is more than 1 and more than 6. Why 1 + 6 is not equal to 10? If plus means just more, is it OK that 1 + 6 is equal to 8, 9, or 10?''

I usually need to care not overwhelming children. Too much explanation is not good. That I learn from other teachers here. So I slow down.

I came back to the Zahlenhaus and asked her to count the 1 and 6 points in the Zahlenhaus. She can count. So, she realized the total number of the points is 7. I explained `plus' means `together'. If something and something together, then they are more. Therefore, `more' is correct, but, it is better to say `together' for plus's meaning.

Next my task is to explain when her method fails. Now she understand 1 + 6 has a meaning. Therefore, this line itself has an answer. Therefore, this 4 line set of Zahlenhaus practice is not always presented. She can not always have four questions. In that case, she can not copy the answers.

This actually takes around 20 minutes. But, I think she realized this.

This time, I was surprised that 8 year old child found, even partially, the following pattern.

  x & + & y & = z \\
  y & + & x & = z \\
  z & - & x & = y \\
  z & - & y & = x \\

Computer stores variables in a memory. A memory cell has an address. If a computer language didn't hide the address, like C or C++, the variable has an address. An address is a name of location. Therefore, we, programmers, need to develop the idea, a value is associated with a location as she explained me. I see she has such ability: associating a location and a value. I wonder if I taught a computer language to her. Well, first she should master what is plus, minus, and more calculation.


Authors in a Markov matrix: Which author do people find most inspiring? (12)

Adjacency matrix and topology of graph

Last time, I explained about what is an adjacency matrix. This time I would like to show the relationships between an adjacency matrix and the corresponding graph.

An adjacency matrix shows how the nodes are connected. When we care only the existence of connection, we call such mathematics topology. Sometimes only the connection is important even in every day life. For example, we usually only care the city train connection, the real distance on the map is less important. Two stations might be less than 300m away, but if there is no train connection between these two stations, sometimes you think these trains are totally separated stations. The map of  Liniennetz found in the web site of Berlin city transportation is such example. You can see the ring-bahn as a nearly perfect ellipse on the map, but, if you check the real map, it is not the same shape. The train map is focusing on the connection between stations. This is a kind of topology map.

Let's return to the Figure 7.
Figure 7: Graph example 2. Each node is a train station.
This is a small part of Berlin train network. You can see three train stations connected to Alexanderplatz station. The adjacency matrix of this graph is the following. (We only use first four letters of the train stations here.)

  & \mbox{Wein.} & \mbox{Alex.} & \mbox{Hack.} & \mbox{Jann.} \\
    0 \\
    1 \\
    0 \\
    0 \\

The number of 1s of row vector tells us how many edges are connected to the node. For example, the row vector of Weinmeisterstr is the following.

    0 \\

The number pf 1s is 1, therefore, the number of edges from Weinmeisterstr node is 1. This number is called ``degree'' of the node. Alexanderplatz row vector has three 1s as shown in below, therefore, the degree of Alexanderplatz node is 3.

    1 \\

You can check the degree of nodes also in Figure 7.

Next time I would like to show you some operations of the matrix.