Skip to main content

Authors in a Markov matrix Part 2 (8) Experimental results: Which author do people find most inspiring?


Wikipedia's Category problem

The category problem here is: we expect a specific category has some expected authors on the list, but the actual Wikipedia's category doesn't have the authors we expected. This causes some data missing. There are three interesting cases we found in the following subsections. We didn't do any additional process for this problem. For example, ``Shakespeare does not exist as an English writer in the Japanese Wikipedia.'' Since we did nothing for this, there is no Shakespeare in the English author rank table in Japanese Wikipedia in our result.

We tried to obtain the data as automatic as possible since this is just our Sunday hobby research project. We didn't spend much time for the fine tuning of these problem. But these are not intuitive (e.g., Shakespeare is not an English author in Japanese Wikipedia.), so how to automatically fill this gap between Wikipedia sense and our intuition is the future work.

No Shakespeare in the Japanese Wikipedia result

The rank of Shakespeare is the best in German Wikipedia and English Wikipedia. However, Japanese Wikipedia doesn't have Shakespeare. Actually, in Japanese Wikipedia has a category called, ``Shakespeare'' and it is the same level of English authors.  The level of English authors category has the following categories in Japanese Wikipedia (as of 2012-11-19) and they are not classified as English authors. Figure 7 shows this page.

Figure 7: The category of English authors page in ja.wikipedia.org as of 2012-11-19.

  • English authors (which has an item: The list of English authors)
  • H. G. Wells
  • William Shakespeare
  • George Bernard Shaw
  • Lord Byron
  • William Blake
  • Oscar Wilde
These authors and the category ``English authors'' are at the same level in the category hierarchy, therefore, Wells, Shakespeare, Shaw, Byron, Blake, Wilde don't exist in the list of English authors. This is a property of Japanese Wikipedia only and other language Wikipedias don't have this problem. The problem was we assumed that the list of English authors have Shakespeare and other those authors. We thought this assumption was reasonable when we started this research.

No Shiki Masaoka in the Japanese Wikipedias result

Shiki Masaoka doesn't exist in the Japanese Wikipedia result. Shiki is under the Japanese 歌人 俳人 (Kajin Haijin) category and not in the Japanese authors category. Therefore, Japanese Kajin Haijin are not listed in this research. We found this when the first result we got by comparing different Wikipedias. This is a good example that the comparison between other Wikipedia is effective.

Not available in other Wikipedia problem

Some Wikipedia categorizes the author depends on what language they wrote instead of which country they lived. For instance, German Wikipedia has ``the list of British authors,'' but English Wikipedia only has ``The list of English writers.'' This list has the authors who wrote their book in English, therefore, it also includes American and Australian, and other English speaking countries' authors. As a result, the comparison between different language Wikipedia is not well defined.

There is another factor that makes the comparison difficult. The size of the list of authors highly depend on each language Wikipedia. For instance, the list of German authors of German Wikipedia has 5975 entries. On the other hand, The list of German authors of Japanese Wikipedia has only 136 entries.

Table 7 and 8 show the PageRank results comparison between German Wikipedia and English Wikipedia. There are n.a. (Not Available in other Wikipedia) entries in the both tables. This entries show the problem.  Table 8 has 16 n.a.s out of 40 entries. This means these authors are listed in the German Wikipedia as British writers, but they are not listed in English Wikipedia as English writes.

We would like to continue some other interesting issues in the next article.

Comments

Popular posts from this blog

Why A^{T}A is invertible? (2) Linear Algebra

Why A^{T}A has the inverse Let me explain why A^{T}A has the inverse, if the columns of A are independent. First, if a matrix is n by n, and all the columns are independent, then this is a square full rank matrix. Therefore, there is the inverse. So, the problem is when A is a m by n, rectangle matrix.  Strang's explanation is based on null space. Null space and column space are the fundamental of the linear algebra. This explanation is simple and clear. However, when I was a University student, I did not recall the explanation of the null space in my linear algebra class. Maybe I was careless. I regret that... Explanation based on null space This explanation is based on Strang's book. Column space and null space are the main characters. Let's start with this explanation. Assume  x  where x is in the null space of A .  The matrices ( A^{T} A ) and A share the null space as the following: This means, if x is in the null space of A , x is also in the n...

Gauss's quote for positive, negative, and imaginary number

Recently I watched the following great videos about imaginary numbers by Welch Labs. https://youtu.be/T647CGsuOVU?list=PLiaHhY2iBX9g6KIvZ_703G3KJXapKkNaF I like this article about naming of math by Kalid Azad. https://betterexplained.com/articles/learning-tip-idea-name/ Both articles mentioned about Gauss, who suggested to use other names of positive, negative, and imaginary numbers. Gauss wrote these names are wrong and that is one of the reason people didn't get why negative times negative is positive, or, pure positive imaginary times pure positive imaginary is negative real number. I made a few videos about explaining why -1 * -1 = +1, too. Explanation: why -1 * -1 = +1 by pattern https://youtu.be/uD7JRdAzKP8 Explanation: why -1 * -1 = +1 by climbing a mountain https://youtu.be/uD7JRdAzKP8 But actually Gauss's insight is much powerful. The original is in the Gauß, Werke, Bd. 2, S. 178 . Hätte man +1, -1, √-1) nicht positiv, negative, imaginäre (oder gar um...

Why parallelogram area is |ad-bc|?

Here is my question. The area of parallelogram is the difference of these two rectangles (red rectangle - blue rectangle). This is not intuitive for me. If you also think it is not so intuitive, you might interested in my slides. I try to explain this for hight school students. Slides:  A bit intuitive (for me) explanation of area of parallelogram  (to my site, external link) .