Skip to main content

Authors in a Markov matrix: Which author do people find most inspiring? (24)


Google PageRank

PageRank [2] is a method to compute a ranking of web pages. Sergey Brin and Lawrence Page modeled the adjacency relationship of web pages via link and analyzed the importance of web pages using eigenanalysis. They proposed to use eigenvector as a criterion of the web search. Soon, they established a company called ``Google'' that used this method for their search engine. In their paper, the basic idea had already been proposed by Jon Kleinberg, who considered the web environment can be analyzed by eigenanalysis. Eigenanalysis itself has been studied from 18 century and it is fundamental in linear algebra. Then, the PageRank is not new and these two are just lucky? I don't think so. Maybe many people thought eigenanalysis to use web search at that time, but, these two really implemented this simple and solid idea. Also the size of web is huge and it is not trivial to compute the eigenvector of the web matrix. I think there is a huge difference between just think about and really implemented the method and proved it works.

In this article, my notation is based on Strang's book [9]. However, the PageRank paper [2]'s notation is different.
In PageRank paper [2], page 4, 2nd paragraph explains eigenvalue \(c\) is \(R=cAR\). I use \(\lambda\) of \(Mx = \lambda x\). Therefore, \(\lambda = \frac{1}{c}\).
In the paper, web graph is not ideal or quite incomplete, therefore, it is hard to compute the eigenanalysis. For instance, many link doesn't have the page, or there are loops of the links. They describe these problems and explain how to solve them. Their model is based on the following user: a user clicks links of a web page randomly, but the user also jumps to a totally irrelevant web page sometimes. In principle, PageRank method scans the web, generates the adjacency matrix, processes dangling links and link loops, adds random jumping term to the adjacency matrix, generates Markov matrix, compute the eigenvector. It is easy to say compute the eigenvector, but since the size of web is huge, they proposed the algorithm to efficiently compute it.

When I read this paper first time a few years ago, I was fascinated. It is elegant, simple, solid beauty linear algebra. Moreover, the theory is not only beautiful, but also practically useful. I recommend to read the paper and I think you will enjoy that. I think that impression kept somewhere in my mind, that is the reason when my friend asked me how to analyze the writer's relationship, I was immediately recall this paper. I heard nowadays search engine algorithm is more complex and sophisticated to adapt the spam and other factors, though, I believe this idea is still in the base.

Conclusion of Part1

In this article, we show the following:

  • We can formulate the authors relationship as a graph.
  • We can use mathematical tools, graph theory and linear algebra, to analyze the graph.
  • One of the powerful tool of linear algebra is called eigenanalysis. We can apply this tool to analyze the author relationship. This is the same method that Google does the web search. Google uses it to find the most influential page to show the best search result.

However, this is not the complete story. The Part 2 of this article will answer the first question, ``Which author inspire the people most?''

See you in the Part 2.

Comments

Popular posts from this blog

Why A^{T}A is invertible? (2) Linear Algebra

Why A^{T}A has the inverse Let me explain why A^{T}A has the inverse, if the columns of A are independent. First, if a matrix is n by n, and all the columns are independent, then this is a square full rank matrix. Therefore, there is the inverse. So, the problem is when A is a m by n, rectangle matrix.  Strang's explanation is based on null space. Null space and column space are the fundamental of the linear algebra. This explanation is simple and clear. However, when I was a University student, I did not recall the explanation of the null space in my linear algebra class. Maybe I was careless. I regret that... Explanation based on null space This explanation is based on Strang's book. Column space and null space are the main characters. Let's start with this explanation. Assume  x  where x is in the null space of A .  The matrices ( A^{T} A ) and A share the null space as the following: This means, if x is in the null space of A , x is also in the null spa

Gauss's quote for positive, negative, and imaginary number

Recently I watched the following great videos about imaginary numbers by Welch Labs. https://youtu.be/T647CGsuOVU?list=PLiaHhY2iBX9g6KIvZ_703G3KJXapKkNaF I like this article about naming of math by Kalid Azad. https://betterexplained.com/articles/learning-tip-idea-name/ Both articles mentioned about Gauss, who suggested to use other names of positive, negative, and imaginary numbers. Gauss wrote these names are wrong and that is one of the reason people didn't get why negative times negative is positive, or, pure positive imaginary times pure positive imaginary is negative real number. I made a few videos about explaining why -1 * -1 = +1, too. Explanation: why -1 * -1 = +1 by pattern https://youtu.be/uD7JRdAzKP8 Explanation: why -1 * -1 = +1 by climbing a mountain https://youtu.be/uD7JRdAzKP8 But actually Gauss's insight is much powerful. The original is in the Gauß, Werke, Bd. 2, S. 178 . Hätte man +1, -1, √-1) nicht positiv, negative, imaginäre (oder gar um

Why parallelogram area is |ad-bc|?

Here is my question. The area of parallelogram is the difference of these two rectangles (red rectangle - blue rectangle). This is not intuitive for me. If you also think it is not so intuitive, you might interested in my slides. I try to explain this for hight school students. Slides:  A bit intuitive (for me) explanation of area of parallelogram  (to my site, external link) .