2012-12-26

Authors in a Markov matrix: Which author do people find most inspiring? (24)


Google PageRank

PageRank [2] is a method to compute a ranking of web pages. Sergey Brin and Lawrence Page modeled the adjacency relationship of web pages via link and analyzed the importance of web pages using eigenanalysis. They proposed to use eigenvector as a criterion of the web search. Soon, they established a company called ``Google'' that used this method for their search engine. In their paper, the basic idea had already been proposed by Jon Kleinberg, who considered the web environment can be analyzed by eigenanalysis. Eigenanalysis itself has been studied from 18 century and it is fundamental in linear algebra. Then, the PageRank is not new and these two are just lucky? I don't think so. Maybe many people thought eigenanalysis to use web search at that time, but, these two really implemented this simple and solid idea. Also the size of web is huge and it is not trivial to compute the eigenvector of the web matrix. I think there is a huge difference between just think about and really implemented the method and proved it works.

In this article, my notation is based on Strang's book [9]. However, the PageRank paper [2]'s notation is different.
In PageRank paper [2], page 4, 2nd paragraph explains eigenvalue \(c\) is \(R=cAR\). I use \(\lambda\) of \(Mx = \lambda x\). Therefore, \(\lambda = \frac{1}{c}\).
In the paper, web graph is not ideal or quite incomplete, therefore, it is hard to compute the eigenanalysis. For instance, many link doesn't have the page, or there are loops of the links. They describe these problems and explain how to solve them. Their model is based on the following user: a user clicks links of a web page randomly, but the user also jumps to a totally irrelevant web page sometimes. In principle, PageRank method scans the web, generates the adjacency matrix, processes dangling links and link loops, adds random jumping term to the adjacency matrix, generates Markov matrix, compute the eigenvector. It is easy to say compute the eigenvector, but since the size of web is huge, they proposed the algorithm to efficiently compute it.

When I read this paper first time a few years ago, I was fascinated. It is elegant, simple, solid beauty linear algebra. Moreover, the theory is not only beautiful, but also practically useful. I recommend to read the paper and I think you will enjoy that. I think that impression kept somewhere in my mind, that is the reason when my friend asked me how to analyze the writer's relationship, I was immediately recall this paper. I heard nowadays search engine algorithm is more complex and sophisticated to adapt the spam and other factors, though, I believe this idea is still in the base.

Conclusion of Part1

In this article, we show the following:

  • We can formulate the authors relationship as a graph.
  • We can use mathematical tools, graph theory and linear algebra, to analyze the graph.
  • One of the powerful tool of linear algebra is called eigenanalysis. We can apply this tool to analyze the author relationship. This is the same method that Google does the web search. Google uses it to find the most influential page to show the best search result.

However, this is not the complete story. The Part 2 of this article will answer the first question, ``Which author inspire the people most?''

See you in the Part 2.

No comments: