## 2012-12-26

### Authors in a Markov matrix Part 2 (1) Experimental results: Which author do people find most inspiring?

This is the part 2 of the article, experimental results. Until the last article, I talked about the question, ``Which author do people find most inspiring?'' From now on, I would like to talk about an answer.

## Analyzing relationships between authors

### Author graph generation method

We apply eigenanalysis on Japanese, English, and German authors to find  out which author do people find most inspiring in the literature in a sense of author network topology. First we need to generate an author graph that represents the relationships between authors. Of course we could generate such graph by hand, i.e., researching a lot of documents about authors. However, the number of famous Japanese authors maybe more than 1000. This is just our Sunday fun hobby project, we don't have enough time to do that.

Fortunately, nowadays we can use cloud knowledge. The natural solution seems to be using the information of Wikipedia. We can generate an adjacency matrix from the Wikipedia's link structure, then apply eigenanalysis to analyze the relationships between authors.

#### Assumption of this experiment

We assume the link structure of author pages in Wikipedia represents the relationships between authors.
This is a debatable assumption. We return to the first question ``What is the relationships between authors?'' in the Part 1 of this article. We define that the relationships of authors are given by the link structure of Wikipedia. Our intuition of this assumption is based on the idea: when a writer of Wikipedia made a link between authors, the writer thought there were some relationships between these authors. If this assumption cannot be accepted, the following experiment has no meaning. So we sometimes say, ``in a sense of Wikipedia link structure, ...'' in this article. So far, we believe this is a good method to find the relationships between authors and we don't have better idea to tackle this problem. When a better method is found, we can discuss this assumption again.

Based on this assumption, we will construct an adjacency matrix based on the link structure of Wikipedia and analyze it by eigenanalysis.

1. Data size: We can use a relatively large digital data
2. Correctness: Wikipedia pages are public and some review has been done
3. Quality: We can expect there are some meaning in the link structure since these pages are made by human

1. Error possibility: There could be errors in the link structure
2. Wikipedia writer bias: Some Wikipedia writer may put some kind of bias depends on their preference
3. Wikipedia edit guideline bias: Wikipedia's editing guideline may cause some kind of bias

The most attractive advantage for us is the large size data availability. If we try to construct an adjacency matrix of Japanese authors, we need to read a huge amount of literature and extract the relationships, or if we were fortunate, we would be able to find a book describing the author relationships, still we need to convert the data to digital processing possible form.