2010-07-24

A personal annotations of Veach's thesis (10) P.60

Star discrepancy D_N^*



I might say, the goal of Quasi-Monte Carlo method seems to have a sort of contradictory. It is:
  1. To minimize the irregularity of distribution
  2. Not using regular uniform samples to avoid aliasing 
To minimize the irregularity of distribution, a straightforward answer is using regular distributed sampling. But, this causes aliasing that is the point to use the Monte-Carlo method (Note: not Quasi-Monte-Carlo).

A discrepancy is a quantitative measure of irregularity of distribution. This measure tells us how much integration error occurs.

The star discrepancy D_N^* is a discrepancy. When we sample many boxes that always includes the origin, it shows the maximum error of estimation of the area of these boxes.

In the case of one dimension, boxes are lines. Therefore we estimate the length of line that includes the origin by sampling. If we sample N points, if we sample regular uniformly, we could not sample less than 1/(2N) length correctly. This is the same as the sampling theory. But, if we go to the high dimensional area (s-dimensional area), the error bound becomes O((log(N)^s)/N).

I don't understand why it is O((log(N)^s)/N). It is s-dimensional generalization, but, it even doesn't work one dimensional case. Some told me there is a lengthy complicated proof. Figure 1 shows the plot. When number of samples are small and dimension is high, then the error becomes larger. This is the growth ratio of log(N)^s and N, someone says it is obvious, but I would like to see that anyway.

This is a program to show this plot. This is matlab/octave language.

%
% visualize star discrepancy
% Octave code: 2010 (C) Shitohichi Umaya
%
x = [10000:100:1000000];
s = 2
y1 = power(log(x), s)./x;
s = 3
y2 = power(log(x), s)./x;
plot(x,y1,"1;(log N)^s/N, s=2;", x,y2,"3;(log N)^s/N, s=3;")

Acknowledgements:
Thanks to Alexander K. to answer me what is the discrepancy.

No comments: