2011-01-09

What matrix gluLookAt generates? (2) Why Gimbal lock happens?

gluLookAt matrix (why Gimbal lock happens?)

The camera posture is represented by a rotation matrix, and camera position is represented by a translation matrix. Because in OpenGL, the default camera position and posture are defined and we move this camera around in the program. The camera is at origin, look into the Z axis minus direction, and up direction is Y axis plus direction. A rotation matrix and a translation matrix look like the following.

where,

• x: camera basis X axis, ``right'' in the figure, ``side'' in the Mesa program in the last blog entry
• y: camera basis Y axis, ``up'' in the figure, ``up'' in the Mesa program in the last blog entry
• z: camera basis z axis, ``Z'' in the figure, ``-forward'' in the Mesa program in the last blog entry
• e: camera position, ``eye'' in the figure, ``eyex, eyey, eyez'' in the Mesa program in the last blog entry

In the Mesa program, eye, lookat,up are given to the gluLookAt function, they are computed as the following:

• z = normalize(eye - lookat)
• x = normalize(cross(up,z))
• y = normalize(cross(z, x))

Where, ``cross'' is a cross product function, ``normalize'' is a vector normalization function. Please note the computation order, it is z, x, y, instead of x, y, z.

By the way, you might notice that the z is -view direction. This is the minus direction for depth of OpenGL. The plus direction is not the camera's viewing direction, but, it is the behind direction. Moreover, the depth of object is plus-large, it is far from the camera, but in the world coordinate, it is minus distance in default. The coordinate is just a representation and you can choose any, so this is not wrong. For example, viewport Y plus is up direction, but, the screen coordinate is Y plus in down direction. This difference between world coordinates and camera's coordinates usually confuses me.

Let me explain a bit about the matrix R. Why this is a rotation matrix? If you see how x, y, z are constructed, these are all perpendicular and normalized, all are length 1 vector. If you change neither any length, but change only the direction of an object, this is rotation. For me, it is more intuitive if I think this matrix as an coordinate transformation. Because if someone said ``rotation,'' I imagine a rotation axis and a rotation angle. But I hardly see them in the matrix construction calculation. I can interpret the matrix R as a coordinate transformation, since I can apply this matrix R to the standard basis as the following:

As you see, [1 0 0 0]^T becomes x, [0 1 0 0]^T becomes y, [0 0 1 0]^T becomes z. Moreover, [2 0 0 0]^T becomes 2x, means each axis length is preserved. Therefore, I can easily see the coordinate transformation in the matrix R. This matrix never changes length, only the direction is changed. This is the same as rotation of rigid body.

The rest operation T is the transformation, movement of the camera position. You may notice the matrix TR has dot products at the bottom row. Because a transformation from an axis to axis is a projection as shown in Figure 2. Therefore, you need the cos value, this is a dot product.

Figure 2: Basis transformation and the relationship with dot product

In this article, I interpreted the Mesa's gluLookaAt code, what this matrix means. When you write a program to move a camera, you need to avoid the gimbal lock effect. Gimbal lock usually happens when view direction and up vector are very close. To avoid this, you move camera as a rigid body like this matrix does. Figure 3 (b) shows an implementation that only change the view direction, but not change the camera direction. In reality, you break the lens or crash the camera to make the Gimbal lock effect. To avoid this, you can rotate the camera itself like in Figure 3 (c).

Figure 3: Gimbal lock camerta. When you want to see a bit upper direction, (b) bending the lens implementation, this causes the gimbal lock, (c) rotate the camera, no gimbal lock

Next time, I will show you my gluLookAt implementation by python. As I mentioned in the motivation, this is (only) useful if you want to write a renderer that is not depends on OpenGL, but use with OpenGL.