2013-01-02

Authors in a Markov matrix Part 2 (8) Experimental results: Which author do people find most inspiring?


Wikipedia's Category problem

The category problem here is: we expect a specific category has some expected authors on the list, but the actual Wikipedia's category doesn't have the authors we expected. This causes some data missing. There are three interesting cases we found in the following subsections. We didn't do any additional process for this problem. For example, ``Shakespeare does not exist as an English writer in the Japanese Wikipedia.'' Since we did nothing for this, there is no Shakespeare in the English author rank table in Japanese Wikipedia in our result.

We tried to obtain the data as automatic as possible since this is just our Sunday hobby research project. We didn't spend much time for the fine tuning of these problem. But these are not intuitive (e.g., Shakespeare is not an English author in Japanese Wikipedia.), so how to automatically fill this gap between Wikipedia sense and our intuition is the future work.

No Shakespeare in the Japanese Wikipedia result

The rank of Shakespeare is the best in German Wikipedia and English Wikipedia. However, Japanese Wikipedia doesn't have Shakespeare. Actually, in Japanese Wikipedia has a category called, ``Shakespeare'' and it is the same level of English authors.  The level of English authors category has the following categories in Japanese Wikipedia (as of 2012-11-19) and they are not classified as English authors. Figure 7 shows this page.

Figure 7: The category of English authors page in ja.wikipedia.org as of 2012-11-19.

  • English authors (which has an item: The list of English authors)
  • H. G. Wells
  • William Shakespeare
  • George Bernard Shaw
  • Lord Byron
  • William Blake
  • Oscar Wilde
These authors and the category ``English authors'' are at the same level in the category hierarchy, therefore, Wells, Shakespeare, Shaw, Byron, Blake, Wilde don't exist in the list of English authors. This is a property of Japanese Wikipedia only and other language Wikipedias don't have this problem. The problem was we assumed that the list of English authors have Shakespeare and other those authors. We thought this assumption was reasonable when we started this research.

No Shiki Masaoka in the Japanese Wikipedias result

Shiki Masaoka doesn't exist in the Japanese Wikipedia result. Shiki is under the Japanese 歌人 俳人 (Kajin Haijin) category and not in the Japanese authors category. Therefore, Japanese Kajin Haijin are not listed in this research. We found this when the first result we got by comparing different Wikipedias. This is a good example that the comparison between other Wikipedia is effective.

Not available in other Wikipedia problem

Some Wikipedia categorizes the author depends on what language they wrote instead of which country they lived. For instance, German Wikipedia has ``the list of British authors,'' but English Wikipedia only has ``The list of English writers.'' This list has the authors who wrote their book in English, therefore, it also includes American and Australian, and other English speaking countries' authors. As a result, the comparison between different language Wikipedia is not well defined.

There is another factor that makes the comparison difficult. The size of the list of authors highly depend on each language Wikipedia. For instance, the list of German authors of German Wikipedia has 5975 entries. On the other hand, The list of German authors of Japanese Wikipedia has only 136 entries.

Table 7 and 8 show the PageRank results comparison between German Wikipedia and English Wikipedia. There are n.a. (Not Available in other Wikipedia) entries in the both tables. This entries show the problem.  Table 8 has 16 n.a.s out of 40 entries. This means these authors are listed in the German Wikipedia as British writers, but they are not listed in English Wikipedia as English writes.

We would like to continue some other interesting issues in the next article.

No comments: