2012-12-30

Authors in a Markov matrix Part 2 (7) Experimental results: Which author do people find most inspiring?


We have seen the eigenanalysis results of the authors in Wikipedia. I found it is interesting just going through the result tables, and thinking why this person is there. For instance, I was surprised Winston Churchill is in the high rank in the English literature. But a few of my friends pointed me out that he is the only the Nobel prize winner of literature and the prime minister of Britain. Following some articles, I would like to discuss about the results.

Discussion

Matrix rank

Table 3 shows that the matrix is not full rank even we removed sink rank pages and out going link only pages. This means there are some groups. These group inside there are connection by links, but between these groups have no links. It is interesting to analyze these groups, but, this will be a future work.

Japanese Wikipedia template bias


Our first PageRank result of Japanese Wikipedia surprised us. Because, Sōseki Natume, Ryūnosuke Akutagawa, Yukio Mishima, Ōgai Mori are all under 100 rank. German Wikipedia result and English Wikipedia result have some similarity, but it seems there is no similarity between Japanese Wikipedia result and other two results. We looked into the result, first we realized the high rank authors are all recent authors, specifically, they are all working after 1930. We first thought the recent authors are more actively edited and updated by the Wikipedia writers. Then, we found all the Akutagawa award winner has high rank. Akutagawa award is a prestigious award, but, we don't understand why Akutagawa himself is too behind these winners. Finally, we found out all the Akutagawa winners has the mutual links as shown in Figure 5. All the award winner got incoming links from all the other winners. This makes these winner's PageRank higher. We consider this is an artificial bias since our assumption is Wikipedia writers makes a link when the writer thinks there is a relationship. But this award links are based on Wikipedia editing template of Japanese authors. We removed these award mutual links, which is shown in Table 12.

Figure 5: Award winner cross link bias problem.

For the readers who are interested in this Akutagawa-award mutual link effect, we show the PageRank result that includes Akutagawa-award mutual link in Table 13 (Note 1). With this bias, all the first to 101st ranks are fulled with Akutagawa-award winner and the first non-Akutagawa award winner finally shows up at 102nd rank who is Mishima Yukio.

After post-processing, only the following eight Akutagawa-award winners are in the top 40: 大江健三郎 (ōe Kenzaburō),松本清張 (Matumoto Seichō),吉行淳之介 (Yoshiyuki Jyunnosuke),開高健 (Kaikō Takeshi), 丸谷才一 (Maruya Saiichi),古井由吉 (Furui Yoshikichi),石原慎太郎  (Ishihara Shintarō),安岡章太郎 (Yasuoka Shōtarō).

Figure 13 shows the adjacency matrices with post-processing (top), without post-processing (middle), and the difference of both (bottom). The middle figure shows some kind of a regular pattern. This regular pattern is the award mutual link. The difference shows the regularity clear, though, the difference is not completely regular since there are several mutual-linking awards biases (e.g., Mainichi Genjyutu award).

Figure 6: Adjacency matrices. Japanese authors in ja.wikipedia.org. Top: Removed Navbox bias, Middle: No postprocessing, Bottom: difference (middle - top)

Table 13: Japanese author rank result with Navbox. We think this Navbox causes a bias.

(Note 1): in Table 13, 赤瀬川原平 (Akasegawa Genpei) won the prize as his pen-name 尾辻克彦 (Otuji Katuhiko).

Next article I would like to discuss about a Category problem.

No comments: