2010-06-16

World Cup 2010 Monte-Carlo Simulator (1)

One of my friends suggested us to predict the world cup points. However, I have no idea. Then I decided to write a Monte-Carlo simulator for the world cup points prediction. As a Sunday researcher, I am interested in Monte-Carlo simulation and discuss about this method with my colleagues. But, I have not implemented such simulator that simulates specific probability distribution. I see this is a good opportunity to implemented it.

My company produces programs that uses a variant of Monte-Carlo simulators. This is good for some specific area, like physical simulation, however, can I use this for world cup prediction? I doubt it. I also don't want to spend more than an hour to implement it. If you are a world cup fun, to simulate it doesn't make sense and no fun, I presume. Also this method can not predict each result anyway. It is like Hari Seldon's Psychohistory in Asimov's Foundation. We could predict average or distribution of the points in this world cup 2010 based on past world cups, however, we hardly predict the each result. Although, this method has this limitation, I have totally no idea about the teams, I think it is better than my prediction. I heard this method is also used to predict the stock market.

In mathematics, we could solve some problems in the range of the assumption. Assumption is important. The problem is how much my assumption works. I set the world cup prediction problem with the following assumptions.

  • Assumption 1. Each match, each team's point is independent.
  • Assumption 2. The point distribution of this 2010 world cup is the same as the last 2006's one.

First, the assumption 1, this is outrageous assumption. This means the point prediction doesn't matter the opposite team. For example, Japan against any team, the predicted points are the same. Unreasonable. However, to be honest, I don't know anything about world cup (Yesterday, I happened to know Japan plays in the world cup this year). So I imagine, maybe most of the team have the similar skills. This is assumption 1. If this instinct is wrong, the result will tell me. If the assumption is wrong, any mathematics gives us garbage.

Second, I set the assumption 2 since my friend's web page has only the last time's result. It might be better to use the past world cup data as many as possible if the rule did not changed. Well, I am just lazy. This assumption might be also wrong.

I implemented a simulator based on these assumptions (wc2010.rb). This program generates a similar point distribution based on 2006's point distribution via ruby's pseudo random number generator. One problem is how to initialize the pseudo-random number generator. This is just a luck. I need one number, called seed. I could use my birthday, or current time in seconds from this January 1st, ... I just pick my friend's suggestion, 42.

Last world cup points distribution is as follows.

WC2006 result distribution

Points
  0    :************************************************
  1    :************************************
  2    :****************************
  3    :***********
  4    :****
  5    :
  6    :*

My simulator's distribution

Points
  0    :******************************************
  1    :******************************************
  2    :*******************************
  3    :*******
  4    :*****
  5    :
  6    :*

They are kind of similar. In world cup 2006, no team had a point 5, therefore, the prediction doesn't have point 5 also. If there is point 5 this time, no chance. Also there is no points more than 6.

The following is the prediction result of the simulator. So far only one result is correctly predicted. I think the assumption 1 is not so good.

 Estimate       Result
 [1]: 2 1       1 1
 [2]: 2 0       0 0
 [3]: 2 1       2 0
 [4]: 1 0       1 0     -> match ARG:NGA
 [5]: 2 3       1 1
 [6]: 1 2       0 1
 [7]: 1 1       0 1
 [8]: 2 3       4 0
 [9]: 2 2       2 0
[10]: 0 0       1 0
[11]: 0 1       1 1
[12]: 0 2       1 1
[13]: 2 0       0 0
[14]: 0 0
[15]: 1 1
[16]: 0 0
[17]: 1 1
[18]: 0 4
[19]: 2 2
[20]: 1 2
[21]: 1 4
[22]: 0 6
[23]: 2 1
[24]: 1 0
[25]: 1 1
[26]: 0 1
[27]: 1 2
[28]: 1 3
[29]: 1 3
[30]: 0 2
[31]: 1 0
[32]: 0 1
[33]: 0 0
[34]: 0 2
[35]: 1 0
[36]: 3 3
[37]: 0 2
[38]: 1 0
[39]: 1 2
[40]: 2 1
[41]: 1 0
[42]: 4 0
[43]: 0 0
[44]: 1 1
[45]: 0 1
[46]: 2 0
[47]: 0 4
[48]: 0 1
[49]: 2 0
[50]: 1 2
[51]: 1 0
[52]: 2 0
[53]: 0 1
[54]: 1 2
[55]: 0 2
[56]: 0 0
[57]: 1 1
[58]: 2 4
[59]: 3 1
[60]: 0 2
[61]: 1 2
[62]: 1 2
[63]: 1 0
[64]: 0 2

3 comments:

Rebecca said...

You are crazy! I don't understand a word of what you say, but please let me know, if math and reality match.
Do you actually watch soccer?
After Japan has lost today, I hope you will support the German team on Saturday?!
Best, Rebecca

Rebecca said...

You are crazy! I don't understand a word of what you say, but please let me know, if math and reality match.
Do you actually watch soccer?
After Japan has lost today, I hope you will support the German team on Saturday?!
Best, Rebecca

Shitohichi said...

Yes, it is a kind of crazy idea. Math is a method of finding similarity/patterns. If something happened in the past, and it could happen again, we can predict it somehow. But, this method has a huge limitation and I don't want to bet on these numbers. Practically this is throwing a dice to decide the points, but, it is a little bit better than that since this dice respects the last WC result. For example, this dice has 0,1,2,3,4,6 (no 5) and 0 will show up at 40.1 percent probability based on the last WC. This time there was a 7-0 match and this can not be predicted since the last WC, no team had 7 points. So, it's just for fun. Currently I have 51 points (2 points for correct winner, 3 points for correct winner + correct difference, 4 points for exact prediction).