Skip to main content

Floating point reproducibility in parallel processing

The floating point reproducibility problem

On a typical computer, the results of floating point operations can differ from the mathematical result. In general, floating point operations like addition may not satisfy all mathematical properties for that operation. For example, the associativity property of addition may not hold for all numerical values:

$(x + y) + z \stackrel{?}{=} x + (y + z)$

A decimal representation of a number cannot always be represented by a finite binary representation in computer hardware. For example, the computed result of $0.1 + 0.01 - 0.1 - 0.01$ is not equal to $0$, but mathematically these values should be equivalent.

For example, you can demonstrate this using the Python interpreter (Python 3.8.5):

>>> 0.1 + 0.01 - 0.1 - 0.01
-5.204170427930421e-18


But mathematical properties do hold in other cases. For example, the result of $0.1 - 0.1 + 0.01 - 0.01$ is equal to $0$.

>>> 0.1 - 0.1 + 0.01 - 0.01
0.0

Another problem results from rounding errors during computation. When one operand is very small compared to the other operand, the result of adding these values can not be exact. This is an order dependent rounding problem. When we add very small numbers to a huge number, all the small numbers can be ignored by a rounding error. The computed result is less than the mathematical value, called underflow. But when we add all the small numbers first and then add that sum to the huge number, the rounding error may differ.

There are several reasons why this problem may occur though the main reason is that the floating point representation is finite in hardware.

Many algorithms depend on the equality of computation results.  Thus, floating point reproducibility in those algorithms is the key to their correctness. A solution can be implemented by using a range of values instead of a single value. Because the computational error is usually very small, we can know if the exact value is within a given range. This is called an interval algorithm. In this article, however, I focus on a method that does not assume that associativity of operations holds during computation. This method is particularly useful in parallel execution of those calculations.

Example problems

We pick the following geometric computation examples:

  •  Ray-object intersection calculation by space subdivision
  •  Plane-triangle intersection calculation on a triangle mesh

Ray-object intersection calculation by space subdivision

Figure 1 shows a space subdivided by two cubes (two sub-spaces) and a polygon (a triangle) located at the boundary. The intersection point between a ray and the polygon is exactly on the boundary. We assume that the intersection point calculation is performed by two processing units, each assigned to the sub-space s1 and sub-space s2. (Processing unit 1 is assigned to s1 and unit 2 to s2.) Each unit computes the intersection in parallel and independently. The following equation represents the ray, where $\boldsymbol{P}_0$ is the view point, $\boldsymbol{d}$ is the direction vector, and $t$ is the ray parameter.

\[\boldsymbol{r}(t) &=& \boldsymbol{P}_0 + t \boldsymbol{d}\]


Figure 1: Space subdivision ray-object intersection

If we use the 3D-Digital Differential Analyzer (3DDDA) algorithm to traverse the spaces, the value of $t$ used in s1 and the value of $t$ used in s2 differ because each $t$ is computed when the ray enters the sub-space. To extract the parallelism, we subdivide the space, then compute the ray-polygon intersection calculation in parallel. However, the intersection point should be the same. To guarantee the numerical reproducibility for this intersection, we must ensure all the input values are exactly the same in each sub-space.

Using exactly the same inputs means that all the bits of the input number's representation are the same.  If an input polygon is subdivided and distributed to the sub-spaces, the order of the edges should be maintained during computation. In most cases, we copy the input. However, it is important to note that the edge direction may differ at a result of the copy operation.

Here the problem is quite visible when we deal with transparent polygons. If ray parameter $t$ differs depending on the sub-space, the hit point may be slightly off. When calculations in both sub-spaces considers a hit in the sub-space, the shading value is computed twice, causing an artifact. Conversely, when both calculations in sub-spaces consider a non-hit in the sub-space, a hole is visible in the polygon.

In my own work, I replaced the 3DDDA algorithm to avoid this numerical reproducibility problem. Rays are always computed from the view point and use the same inputs for each sub-space. It is hard to avoid the numerical reproducibility problem in 3DDDA; it is an incremental algorithm and addition is the basis for its calculations.

Plane-triangle intersection calculation on a triangle mesh

The strategy for solving this problem is essentially the same as that used in the ray-object case. Each independent computation receives exactly the same inputs. For instance, let's assume our goal is to find an intersection line between a triangle mesh and a plane. Figure 2 shows a plane $P$ and a triangle mesh composed of two triangles $v_0 v_1 v_2$ and $v_0 v_3 v_1$.  The intersection line is shown as $i_0 i_1$ and $i_1 i_2$.  Here we assume that the plane and triangles are not co-planer and that the intersection lines are clearly defined. (Handling the co-planar case is important, but I will not mention it further since it is less relevant to the issue of numerical reproducibility.)

Figure 2: Plane-triangles intersection

  

Figure 3: Edge orientation

Let's find the position of intersection point $i_1$. In this case, there are two representations of the edge $\boldsymbol{e}$: $\boldsymbol{e}_{v_0 \rightarrow v_2}$ and $\boldsymbol{e}_{v_2 \rightarrow v_0}$.

\[\boldsymbol{e}_{v_0 \rightarrow v_2} &=& v_0 + t (v_2 - v_0) (1)\\

 \boldsymbol{e}_{v_2 \rightarrow v_0} &=& v_2 + t (v_0 - v_2) (2)\]

Figure 3 shows the edges of the adjacent triangles: (a) the opposite direction, (b) the same direction. Strictly speaking, there are four combinations of directions for the two edges. But for numerical reproducibility, it is only necessary to determine if the adjacent edge directions are the same or not, so it is sufficient to consider the cases (a) and (b) only.

If the associativity of computation holds, both Eq. (1) and Eq.(2) produce the same result for the $i_1$ position calculation. However, the floating point computation results are slightly different because of these different edge representations. This causes an issue if an algorithm assumes the $i_1$ position calculation results in exactly the same binary representation of the number.

To solve this problem, we can define a unique order of vertices $v_0$ and $v_1$ for all edges. The following is an example comparison functor to define such order in C++ (for example, for the std::map container).


// Simple example of a 3D point representation

struct Point3D {
    float x;
    float y;
    float z;
};

// Point3D comparison functor for the total ordering
struct comparePointOrder
{
    bool operator()(const Point3D& v0, const Point3D& v1) const
    {
        return (v0.x < v1.x)
            || ((v0.x == v1.x) && (v0.y < v1.y))
            || ((v0.y == v1.y) && (v0.z < v1.z));
    }
};

This comparison function is based on the concept of number comparison. For example, consider a comparison of two two-digit numbers, say 23 and 24. We first compare the tens place, if one of the numbers has a smaller digit, that number is smaller. If both numbers have the same digit of tens place, we move to one's place and compare the digit of the one's place. For any pair of numbers, we can repeat the comparison until we determine that one of the numbers is smaller, or that the two values are equal when all the digits are the same. In the preceding code, note that equality checks are essential. Without equality tests in the code, the comparison results differ. (I once had introduced just such a bug.)

When computing an intersection position on an edge, the usual input is a triangle and a z value. But it is important to ensure that the direction of the edge is the same for each intersection computation. For example, when computing the position of $i_1$ and the input triangle is $v_0 v_1 v_2$, then the edge direction is $v_0 \rightarrow v_1$. The same edge direction should be kept as $v_0 \rightarrow v_1$ for the triangle $v_0 v_3 v_1$. To compute the intersection points, the correctly oriented edge should be generated dynamically per edge.

A typical data structure implementation of a triangle mesh maintains the consistency of the normal vectors of the triangles. All triangles have the same vertex order, clockwise or counter-clockwise. Thus, Figure 2 (a) shows typical edge directions for two triangles. However, to maintain numerical reproducibility, I suggest generating the edges as shown in Figure 2 (b) on the fly. In this way, you can achieve the numerical reproducibility for computing the position of $i_1$.

This technique enables a lookup of the $i_1$ point from both adjacent triangles using \texttt{std::map} with the suggested comparison functor. Correct results are produced not only for serial computation but for parallel processing as well.

Conclusions

I have described methods to maintain numerical reproducibility.  These methods also work in a parallel processing context. 3DDDA is a simple and powerful algorithm, but it is weak on numerical reproducibility. Mathematically, line segments $AB$ and $BA$ are the same. However, as described in this article, this is not necessarily true for computational calculations. If an algorithm is sensitive to numerical reproducibility, I suggest distinguishing line segments $AB$ and $BA$. To avoid such numerical reproducibility issues, one technique is to keep a total order of input segment vertices.

Acknowledgment

I am thankful for the help provided by Andy Kopra for valuable suggestions and comments.

References

For numerical reproducibility, see [1,2], for example. For 3DDDA, see [3] or Wikipedia [4].

  1. Nathalie Revol and Philippe Th{\'{e}}veny, "Numerical Reproducibility and Parallel Computations: Issues for Interval Algorithms", http://arxiv.org/abs/1312.3300, 2013
  2. James Demmel and Hong Diep Nguyen, "Fast Reproducible Floating-Point Summation", In 21st IEEE Symposium on Computer Arithmetic, 2013
  3. Fujimoto Akira and Takayuki Tanaka and Kansei Iwata, "ARTS: Accelerated Ray-Tracing System", IEEE CG & Applications, 148--157, 1986
  4. Wikipedia, The Free Encyclopedia, "Digital differential analyzer: graphics algorithm", https://en.wikipedia.org/wiki/Digital_differential_analyzer_(graphics_algorithm), [Online; accessed 26-April-2021]

Comments

Popular posts from this blog

Why A^{T}A is invertible? (2) Linear Algebra

Why A^{T}A has the inverse Let me explain why A^{T}A has the inverse, if the columns of A are independent. First, if a matrix is n by n, and all the columns are independent, then this is a square full rank matrix. Therefore, there is the inverse. So, the problem is when A is a m by n, rectangle matrix.  Strang's explanation is based on null space. Null space and column space are the fundamental of the linear algebra. This explanation is simple and clear. However, when I was a University student, I did not recall the explanation of the null space in my linear algebra class. Maybe I was careless. I regret that... Explanation based on null space This explanation is based on Strang's book. Column space and null space are the main characters. Let's start with this explanation. Assume  x  where x is in the null space of A .  The matrices ( A^{T} A ) and A share the null space as the following: This means, if x is in the null space of A , x is also in the n...

Gauss's quote for positive, negative, and imaginary number

Recently I watched the following great videos about imaginary numbers by Welch Labs. https://youtu.be/T647CGsuOVU?list=PLiaHhY2iBX9g6KIvZ_703G3KJXapKkNaF I like this article about naming of math by Kalid Azad. https://betterexplained.com/articles/learning-tip-idea-name/ Both articles mentioned about Gauss, who suggested to use other names of positive, negative, and imaginary numbers. Gauss wrote these names are wrong and that is one of the reason people didn't get why negative times negative is positive, or, pure positive imaginary times pure positive imaginary is negative real number. I made a few videos about explaining why -1 * -1 = +1, too. Explanation: why -1 * -1 = +1 by pattern https://youtu.be/uD7JRdAzKP8 Explanation: why -1 * -1 = +1 by climbing a mountain https://youtu.be/uD7JRdAzKP8 But actually Gauss's insight is much powerful. The original is in the Gauß, Werke, Bd. 2, S. 178 . Hätte man +1, -1, √-1) nicht positiv, negative, imaginäre (oder gar um...

No virtual machine on Oracle virtual box and Avira

December 2015, I suddenly cannot run Oracle VM Virtual Box (5.0.10) on Windows 7, my desktop machine. It failed to create a virtual machine, the error message is the following. VirtualBox - Error In supR3HardNtChildWaitFor --------------------------- Timed out after 60001 ms waiting for child request #1 (CloseEvents). (rc=258) where: supR3HardNtChildWaitFor what: 5 Unknown Status 258 (0x102) (258) - Unknown Status 258 (0x102) I relatively less use the virtual machine on this desktop machine. But when I would like to use Linux, then I need to reboot the machine. This is inconvenient. I have another windows 7 notebook, but I don't have this problem. Today I found the solution. https://avira.ideascale.com/a/dtd/Avira-sollte-das-Ausf%C3%BChren-von-VMs-in-Virtualbox-nicht-blocken/160234-26744#idea-tab-comments The combination of Avira's process protection and Virtual Box cause this problem. Avira announced the real solution will be provided at the release of 9th of Feb...