ExperimentThe Wikipedia data we have used is shown in Table 1. Fortunately, every Wikipedia has a list of the authors for each language, we use the list as the root page. We downloaded the author pages identified by the root page. We cared the server load for the download, we downloaded data by 15 sec/page to not overload the server.
The page ``石原慎太郎'' in the Japanese Wikipedia was the only compressed page, we expanded the page when we run our analysis tool. We had a choice of the root pages, for instance, we used Liste_britischer_Schriftsteller for English author list in the German Wikipedia instead of Liste_englischsprachiger_Schriftsteller. Which root page we chose is in Table 1. There is no reason these list should be chosen, they are just our choice in this experiment. All the files were downloaded 2012-5-30 for this experiment.
|Table 1: Experimental data set|