Black Art of 3D Game Programming, Chapter 10: 3D Fundamentals
From Dpfileswiki
This chapter has lots of mathematic notation that this wiki has difficulty displaying. Where the usual tags could not be applied, a ^ denotes an exponent and _ denotes subscript. Thank you for your patience.
We now have the knowledge and power to cast many spells. However, all of our magick is still trapped in the two-dimensional space called the Cartesian plane. It’s time to put away our toys and become real Cybersorcerers! This chapter is an introduction to basic 3D concepts, jargon, and mathematics needed to cast the most wicked of 3D spells. Then during the remaining chapters of the book, we’ll focus more on each of these topics and generate the proper computer code for a complete 3D graphics engine, much like the 2D engine used to make Starblazer.
This chapter is full of trigonometry and linear algebra. However, we’re going to move through the material very slowly and make sure that we understand the origin and derivation of all of it. Keep in mind that we’re going to cover all of these concepts in detail many more times, so if something doesn’t click, don’t worry too much–it will become clearer down the line. The main point of this chapter is to acquire a new vocabulary along with getting a grip on 3D fundamentals and the mathematics used throughout the remainder of the book.
Starting with the Flatlands
Before we begin studying 3D systems, let’s review the simpler 2D plane and the basic concepts and operations that relate to it. Figure 10-1 shows the standard 2D plane. There are two axes, labeled X and Y. The X axis runs horizontally and the Y axis runs vertically. Furthermore, the X and Y axes intersect at a 90 degree angle–in other words, they are perpendicular.
Up to this point, we have been performing graphics on the 2D plane, except that the Y axis on the computer screen is inverted, and the origin of the computer screen is located at the upper-left corner of the screen. Thus, to transform any coordinate from a standard 2D Cartesian plane to screen coordinates, we would use the following formulas:
x_screen = x + SCREEN_WIDTH/2; y_screen = -y + SCREEN_HEIGHT/2;
Basically, the above fragment translates the origin to the center of the screen and inverts the Y component. Anyway, let’s see what the 2D plane has to offer as far as geometrical primitives.
Points, Lines, and Polygons in 2D
Since the 2D plane is flat, not many primitives can be defined in this rather limited space. However, there are a few primitives, each based on the previous one, that we can discuss. The first geometric primitive is the point. A point is a single location on the 2D plane located by its coordinates (x,y). For example, Figure 10-2 shows a collection of points labeled on a 2D plane. A point has a very interesting attribute–it has no dimension! That’s right. Even though a point locates a particular location in the 2D plane, the point itself has no dimension. That’s because a point has no size. It is infinitely small.
The next geometrical entity is the line. A line is a collection of points that span from some starting location to some ending location. Figure 10-3 shows a couple of lines drawn between points. Referring to the figure, we see that to define a line in a 2D plane, we simply need two points: the starting point and the ending point.
The final geometrical primitive is the polygon, which means “many-sided object.” For our purposes, a polygon is a collection of lines that make up a closed shape. Figure 10-4 depicts a few familiar polygons, including a triangle (three sides), a square (four sides), and a multisided polygon that is irregular. This brings us to the concept of regular and irregular polygons.
A regular polygon is a polygon that has interior angles that are similar. An irregular polygon has interior angles that have no such restraint. Figure 10-5 illustrates the difference between regular and irregular polygons. Finally, a polygon, by nature of its composition (a group of lines), can be defined by either a set of line segments or by a set of points. We will use the latter definition since it will make many of our future calculations simpler. Therefore, a polygon will be thought of as a collection of points that are connected by line segments that make up the closed polygon.
Transformations in the 2D Plane
Now that we have a little ammo to work with, let’s learn to manipulate 2D objects. Since all the 2D primitives we have spoken of are based on a single point, it makes sense to learn how to manipulate a single point and use the results as the foundation to manipulate more complex objects, such as lines and polygons. With that in mind, the three basic transformations that can be performed in the 2D plane are translation, scaling, and rotation. Translation and rotation make sense on single points, but scaling really only works on lines and polygons. We’ll get to that in a minute; for now let’s see how translation works.
Translation
Given that an object (which in our case is a single point) is located at the position (x,y), the object is translated or moved by an amount tx in the X direction and ty in the Y direction by the following transformation:
x = x+tx; // x translation y = y+ty; // y translation
Figure 10-6 shows this process in action. If an object were made up of more than one point, then each point of the object would be transformed in the same way. For example, if an object is defined by a set of points like this,
int x[NUM_POINTS],y[NUM_POINTS];
and the arrays x[ ] and y[ ] are initialized with a set of points that make up the object, then the object can be translated with the following variation of the translation transformation:
for (index=0; index<NUM_POINTS; index++)
{
// translate the ith point
x[index] = x[index] + tx;
y[index] = y[index] + ty;
} // end for index</pre>
Therefore, if an object is defined by a set of points, in order to translate the object, each point is translated separately.
Scaling
Scaling simply means “to change the size of.” However, since a single point has no dimension, the simplest object that we can scale is a line segment, which can be thought of as an object constructed of two points. But let’s take this idea a little further and talk about scaling a real object that is made of three or more points (three points is the minimum number of points that can define a polygon).
Take a look at Figure 10-7, which shows two triangles. The leftmost triangle is defined as the points (0,2), (–2,–2), and (2,–2). If the triangle is scaled by a factor of 2, then the result is shown on the right of Figure 10-7. Analyzing the new points that make up the triangle, we see that they are (0,4), (–4,–4), and (4,–4). Amazingly enough, they are all exactly two times each of the original triangle’s points. Well, it’s not that amazing…they’d better be!
Mathematically, the X and Y components of each of the points that make up the first triangle are multiplied by the scaling factor, which in this case is 2. Thus, to scale an object that is composed of a set of points, the following algorithm can be used:
for (index=0; index<NUM_POINTS; index++)
{
// multiply each component by the scaling factor
x[index] = x[index] * scaling_factor;
y[index] = y[index] * scaling_factor;
} // end for index
There is one caveat about scaling that we must take into consideration: Any object that we scale must be defined by a set of points such that its center is located at the origin, that is, (0,0). Otherwise, scaling will also produce translation. Take a look at Figure 10-8 to see this happening. The problem is that the object has no “anchor.” When the object is defined such that its origin is (0,0), the object scales about the origin. When the object isn’t defined relative to the origin, the object will scale and translate about its local origin, and the result will be a properly scaled object but at the wrong location.
One final detail about scaling is, we can scale an object up or down. In other words, the scaling factor can be less than 1.0. For instance, if we wanted to make an object half its current size, we would multiply by a scaling factor of 0.5. Negative scaling factors will work also, but the results are called reflections and aren’t that useful, at least in our work.
Rotation
This is the big one. Rotation has always been a complicated transformation to nail down mathematically. Therefore, we are going to derive the rotation equations ourselves, and if there is one thing that you get out of this chapter, it will be how to rotate something! We are going to begin our rotation derivation with the following setup. We want to rotate a single point an angle θ (theta) in the counterclockwise direction (see Figure 10-9).
There are many ways to derive the rotation equations, but I think the following method is the easiest to grasp. All right, here’s how it works. Given any point (x,y), if we want to rotate it by an angle theta, then the result will be a new point (x′,y′) that is a linear combination of the old point (x,y) and some functions of theta, such as
- x′ = x*f1(θ) + y*f2(θ)
- y′ = y*f3(θ) + y*f4(θ)
We have (x,y) and θ, but we need to know the functions f1(θ), f2(θ), f3(θ), and f4(θ). Well, most people know that they have something to do with SIN and COSINE, but the question is, what is the relationship? The answer can be found by rotating the points (0,1) and (1,0) and then using the results to solve for the unknown functions f1(θ), f2(θ), f3(θ), and f4(θ). First, let’s take the point (1,0) and rotate an angle θ in the counterclockwise direction. This is shown in Figure 10-10. The result of this rotation is a point (x,y), but what are x and y in terms of θ? The answer can be found by breaking down the resulting triangle that is formed by the X axis and the point (x,y). We’ll use the Pythagorean theorem, which states that the sum of the squares of the sides of a right triangle is equal to the square of the hypotenuse. Thus, the hypotenuse in Figure 10-10 is (x2+ y2)1/2. Now, recalling that SIN and COSINE of any angle are defined as
opposite side
SIN θ = ___________
hypotenuse
and
adjacent side
COS θ = ___________
hypotenuse
then SIN θ and COS θ in Figure 10-10 must be equal to
y
SIN θ = _______
(x2+y2)1/2
and
x
COS θ = ________
(x2+y2)1/2
which for a line that has length 1.0, the denominators also become 1.0, and the results are
- COS θ = x and SIN θ = y
Therefore, we can deduce that if the point (1,0) is rotated by an angle θ in the counterclockwise direction, then (1,0) becomes (COS θ, SIN θ), or in other words, the functions f1(θ) = COS θ and f3(θ) = SIN θ since COS θ = x and SIN θ = y. We’re halfway there!
To find the remaining two unknown functions, we must rotate another test point that will give us the other two desired functions. And of course, the logical selection is the point (0,1). Take a look at Figure 10-11. Here, we see the point (0,1) being rotated an angle θ in the counterclockwise direction. The resulting point we again label (x,y). And using the same reasoning as before, we deduce that COS θ and SIN θ have the following formulas
- COS θ = y and SIN θ = –x
or
- COS θ = y and –SIN θ = x
Therefore, the point (0,1) when rotated an angle θ in the counterclockwise direction, becomes (–SIN θ, COS θ). Hence, the final functions are f2(θ) = –SIN θ and f4(θ) = COS θ. If we take our results and plug them back into our original linear transformations, the resulting rotation equations are
- x′ = x*(COS θ) – y*(SIN θ)
- y′ = x*(SIN θ) + y*(COS θ)
Although the above derivation is one of the simplest (there are much harder ways to derive the equations, right Dr. Mitchem?), it would be a lot simpler if we did it with matrices–but don’t worry, that’s on our hit list!
So how do we use the rotation equations now that we have them? We use the rotation equations similarly to how we use the translation or scaling equations. However, there is one detail that we need to keep in mind when performing the calculation, and that’s the interdependence of the equations. Basically, both x′ and y′ are functions of the old point (x,y). This means that during the rotation calculations, the old (x,y) can’t change. For example, we might be tempted to do something like this:
for (index=0; index<NUM_POINTS; index++)
{
x[index] = x[index] * cos(theta) - y[index] * sin(theta);
y[index] = x[index] * sin(theta) + y[index] * cos(theta);
} // end for index
The problem with the above fragment is that after x[index] has been computed, it’s used in the computation of y[index]! This will result in an error. The solution is to use temporary variables during the rotation process, as shown in the next fragment:
for (index=0; index<NUM_POINTS; index++)
{
// rotate the point and save in temporary variables
x_rot = x[index] * cos(theta) - y[index] * sin(theta);
y_rot = x[index] * sin(theta) + y[index] * cos(theta);
// restore results back into array
x[index] = x_rot;
y[index] = y_rot;
} // end for index
The above fragment would result in the proper rotation of the points defined in the arrays x[ ] and y[ ] by an angle theta in the counterclockwise direction. Of course, all calculations are done in floating point and theta is in radians, since C/C++ math libraries like radians instead of degrees. If you have forgotten the relationship between radians and degrees, here it is:
- 360 degrees = 2Π radians
The beauty about all the transformations we have just discussed is that they can all be performed using matrices. Moreover, by using matrices, multiple operations can be concatenated into a single transformation matrix, allowing an object to be scaled, translated, and rotated in a single pass. But we will get to this a bit later. At this point, let’s take a look at 3D space and see what we’re dealing with.
Introduction to 3D Space
Three-dimensional space should be very familiar to you since we live in one! The question is, how are points and objects located in 3D space? The answer is, the same way they are located in 2D space–with coordinates. Any point in 3D space can be located with three coordinates: x, y, and z. The new z coordinate relates to the new axis that must be added to describe a 3D space. There is a single catch with this z coordinate, and that is the relationship between the positive Z axis and the X and Y axes we are used to. The catch is that given a standard X-Y plane or Cartesian coordinate system, the new Z axis can either point into or out of the paper. For this reason there are two popular systems in use for locating points and objects. They are called the left-handed system and the right-handed system.
The Left-Handed System
Figure 10-12 shows the left-handed system for locating points and objects. As you can see, the X and Y axes are where we would expect, but the new Z axis is pointing into the paper. It’s called the left-handed system because if you roll your left hand from X-Y, your thumb points in the positive Z-axis direction. This system is commonly used in computer graphics, while the right-handed system is more popular in mathematics. Let’s see why.
The Right-Handed System
As shown in Figure 10-13, the right-handed system is almost the same as the lefthanded system except that the Z axis is mirrored. This really doesn’t mean much as far as calculations are concerned, but it does make diagrams a bit easier to draw and analyze. This is why many math and physics books prefer the right-handed system to the left-handed system. As an example, imagine that we wanted to plot the point (1,1,1) on a righthanded coordinate system. The result would look like Figure 10-14. The point is “in front” of the paper, if you will. However, if we plot the same point on a lefthanded system, as shown in Figure 10-15, the point is behind the X-Y plane, or behind the paper. It’s just a bit weird.
Anyway, the moral of the story is that either system works fine, it’s really a matter of taste. We use both systems in this book. Our final graphics engine will use a left-handed system, but when we are diagramming and so forth, we will tend to use the right-handed system since it makes figures easier to visualize.
Now that we have that down, it’s time for a crash course in vector math.
Vectors and Vector Operations
Vectors are mathematical entities that describe both magnitude and direction. For example, the number 100 is a scalar value. It describes a magnitude, but no direction. A vector, on the other hand, has both a magnitude and direction that are in a compact form. For example, take a look at Figure 10-16. Here we see a 2D plane with a vector drawn in it. The name of the vector is u (we will always use lowercase bold to denote vectors), and it points in the northeast direction. Moreover, u has a length associated with it, which relates to its magnitude. But let’s see if we can be a little more precise about these quantities.
The vector u is said to have components. In the case of Figure 10-16, the vector u is a 2D vector and thus has two components. The components are the x and y values associated with the tip of the vector, which are <3,5>. This means that the vector u has an X component of 3 and a Y component of 5. Figure 10-17 shows a few more vectors and their components. As you see, some vectors can have the same direction, but have different lengths or magnitudes.
So, how do we compute the magnitude of a vector? The answer is that we use a triangle along with a bit of geometry. All vectors by definition have their initial points at the origin, which is (0,0) in a 2D space or (0,0,0) in a 3D space. Therefore, the magnitude of a vector can be found by dropping down a perpendicular and then solving for the hypotenuse. This process is shown in Figure 10-18. The vector v, which has components <3,4>, is really the hypotenuse of the triangle. Therefore, the length of v, denoted |v|, can be computed with this equation:
- length of v = |v |=(x2 + y2)1/2 = (32+42)1/2 = 5.
Amazingly enough, the above calculation is simply the Pythagorean theorem again.
Vectors are very useful in computer graphics, as they help simplify much of the complex calculations. Since computer graphics deals with 2D or 3D points, and a point itself is a vector, it makes sense to make up a set of rules to manipulate vectors. But before we do that, let’s reiterate what a vector is. A vector is a mathematical entity that contains both magnitude and direction. Vectors are especially useful in 2D and 3D graphics, where they are represented by arrows with a certain direction and length. The tip of a vector is called the terminal point and the start of a vector is called the initial point. Also, each vector is really composed of a set of components. These components are made up of the x, y, and z (if the vector is in 3D) contributions of the vector as drawn from the origin.
Now let’s talk about the operations that can be performed on vectors and what they mean mathematically and geometrically.
Addition of Vectors
Vectors can be added just as scalars can, except that we must be careful adding vectors since they are made up of one or more components. For example, suppose we have two vectors in a 2D space, as shown in Figure 10-19, and we want to add them. This can be done by adding the components of each of the vectors. If we refer to Figure 10-19, we see that vector u=<4,2> and vector v=<1,2>. To add the vectors, we must separately add the X and Y components of the vectors, like this:
- u + v = <4,2> + <1,2> = <4+1,2+2> = <5,4>
The geometrical addition of u and v is shown in Figure 10-20. Basically, to add two vectors together, we can take one vector and attach it to the terminal point of the other vector. Then we draw a third vector, which starts from the origin, to the resulting geometrically added vectors. You might be wondering how this works? Well, if you think about it in another way, it makes sense. The first vector u can be thought of as a locator. Then vector v can be thought of as another locator that is relative to u. Thus, we walk to the end of u, and then we walk in the direction of v for the length of v. The result will be a position that is the sum of the two vectors.
Subtraction of Vectors
Subtraction of vectors is done in the same way as addition. For example, Figure 10-21 shows two vectors u and v. If we want to subtract them, we simply subtract components, just as we added components to add vectors. Thus, to compute u–v, we use the following math:
- u – v = <4,1> – <1.5,4> = <4–1.5,1– 4> = <2.5,–3>.
Geometrically, subtraction isn’t as obvious as addition, but take a look at Figure 10-22 to see if it makes sense. The best way to think of geometric subtraction is addition of the opposite. In other words, in the case of our previous example, u–v is equivalent to u+(–v). But—how do we compute (–v)? Well, we have to learn to perform multiplication.
Scalar Multiplication
Just as scalar numbers can be multiplied by one another, vectors can also be multiplied by a scalar. However, the scalar multiplier must be multiplied by each component of the vector. Figure 10-23 shows the vector u being multiplied by 3. You will notice that the direction of u does not change, but its length does. This operation can be illustrated by the following math:
- 3*u = 3*<2,4> = <3*2,3*4> = <6,12>.
Or, in general, a vector u times a scalar S is equal to
- S*u = S*<u1,u2,…un> = <S*u1, S*u2, ...S*un>.
which in the case of a 3D vector would look like
- S*u = S*<ux,uy,uz> = <S*ux,S*uy,S*uz>.
where u1, u2, and u3 are the x, y, and z components of the vector, respectively.
Scalar multiplication has many uses. We can shrink or stretch vectors by multiplying scaling factors smaller or larger than 1.0. Also, we can flip or invert a vector by multiplying by –1, as shown in Figure 10-24. An important fact to remember is that vectors can have from one to a number of components. Of course, in computer graphics most of the vectors are 2D or 3D and thus have two or three components. But if we wish, we can have vectors of any dimension. Nevertheless, the same operations can be performed on all vectors. We must simply make sure to process each element or component of each vector.
The Dot Product
We’ve seen how to multiply a scalar and a vector. But can we multiply two vectors, and if we can, what does the result mean? The truth is, in general, multiplying the components of two vectors doesn’t do much. However, there is a special vector product that is useful, and it is called the dot product or Euclidean inner product. The dot product is used to extract the angular relationship between two vectors. Take a look at Figure 10-25. Here we see two vectors u and v. What if we wanted to know the angle between them? This is a trivial question, but it is an important one. Luckily for us, we aren’t going to derive this one, we are simply going to look at the results. The dot product is denoted by the “.” symbol and is computed as follows:
- u . v = |u |* |v|* cos θ
where |u| and |v| are the magnitudes of u and v, and θ is the angle between them.
If we use a bit of algebraic manipulation, we see that θ can be found with the following formula,
(u . v)
θ = COS^(–1) _________
|u | * |v |
which states that θ is the inverse COSINE of u dot v divided by the magnitude of u and the magnitude of v. However, there is a bit of a problem here. How do we compute (u . v)? The answer is that (u . v) is the sum of the products of the components of the vectors u and v. In the case of vectors that have two components (u . v) looks like this:
- Given u=<ux,uy>, v=<vx,vy>
- u . v = ux*vx + uy*vy
In the case of 3D vectors then, (u . v) is computed with:
- Given u=<ux,uy,uz>, v=<vx,vy,vz>
- u . v = ux*vx + uy*vy + uz*vz
Now, (u . v) has some interesting properties. If we don’t want to go through the pain of finding the actual angle θ between the vectors u and v, but instead are only interested in the gross relationship between them, we can simply compute (u . v) and refer to the following:
- If (u . v) is
- > 0, then the angle between u and v is acute, as shown in Figure 10-26 (A).
- < 0, then the angle between u and v is obtuse, as shown in Figure 10-26 (B).
- = 0, then u and v are perpendicular, as in Figure 10-26 (C).
The dot product is very important in 3D graphics during hidden surface removal and lighting calculations, so make sure your tractor beam has a lock on it before moving on!
The Cross Product and Normal Vectors
Many times in 3D graphics, a pair of vectors exist that we would like the normal to—in other words, we would like a vector that is perpendicular to both of the vectors in question. Figure 10-27 shows this situation. The vectors u and v form a base for the vector n that is perpendicular to both u and v. But how can we compute such a vector? We compute the normal vector using a computation called the cross product.
Again, we aren’t going to derive the cross product since it is rather complex and tedious. We simply want the final results. The cross product operation is denoted by x and is computed as follows:
- Given u = <ux,uy,uz>, v = <vx,vy,vz>
- uxv =<ux,uy,uz>x<vx,vy,vz>=<uy*vz – uz*vy, ux*vz – uz*vx,ux*vy – uy*vx>
- = <nx, ny, nz>
In most cases, when a normal vector is computed, the resulting vector is normalized. Normalization is the process of resizing a vector so that its length is equal to 1.0. This operation can be performed on any vector, but it’s almost always performed on normal vectors. Let’s see how to do it.
Unit Vectors
The concept of normalization is a common one. In general, normalization is the scaling of a data set such that all the elements in the set have values relative to some specific baseline value. In most cases, this baseline value is selected to be 1.0. This is also the case in vector mathematics. Unit vectors are vectors that have been scaled down such that their resulting magnitude is 1.0. The direction of the vector isn’t changed, only its length. Figure 10-28 shows the results of normalizing a vector to length 1.0. Obviously, we must multiply the components of the vector by some magic number such that the resulting components, when used in a magnitude calculation, will result in a magnitude of 1.0. The magic number we’re looking for is the magnitude of the vector! For example, if we want to create a unit vector from the vector u, then we simply divide each component of u by its own magnitude, denoted by |u|. If we again refer to Figure 10-28, we see that u=<3,4>; therefore, to compute a unit vector version of u, we would perform the following calculations:
First compute the length of u, that is, |u|:
- |u | = (32+42)1/2 = 5
Then divide each component of the original u by 5:
- u* = 1/5*<3,4> = <3/5,4/5>
Now, if we were to calculate the magnitude of u*, we would find it to be 1.0. Give it a try and see what you get. In general, to compute a unit vector, the following formula can be used:
* u
u = ______
|u |
where the * character denotes unit vector.
Generating Vectors from Points
We have been speaking of vectors in a rather cavalier way thus far. Let’s stop a moment and look at a real-world example of computing and using vectors productively. First, a vector can always be generated from any two points P1 and P2 in any space, 2D or 3D. Take a look at Figure 10-29. It shows the two points P1 and P2 in a 3D space. If we wanted to compute a vector from P1 to P2, we would use the following formula:
- u = P2 – P1 = (3,4,5) – (1,1,1) = <2,3,4>
We see that a vector that starts from a point and ends at another point is computed by subtracting the tail or terminal end from the start or initial point. Figure 10-29 shows the vector u. Now, if we wanted the vector from P2 to P1, we would reverse the operation, resulting in the following calculation:
- v = P1 – P2 = (1,1,1) – (3,4,5) = <–2,–3,–4>
Don’t be alarmed by the negative signs, they are perfectly legal and make sense when we look at the resulting vector v in Figure 10-30.
Now, as a hard-core example of using vectors, let’s make up a quick problem to get the hang of it. Take a look at Figure 10-31, which depicts a triangle in 3D made up of the points P1, P2, P3. Let’s see if we can compute a normal to this triangle. Let’s begin by computing the two vectors u and v, as shown in the figure, with the following calculations:
- u = P2 – P1 = (0,0,5) – (0,2,0) = <0,–2,5>
- v = P3 – P1 = (3,0,0) – (0,2,0) = <3,–2,0>
Now if we compute the cross product between u and v—that is, u x v—then the resulting normal vector will be perpendicular to the triangle. The dot product is computed as follows:
- u x v = <–2*0 – (–10), –(0*0 – 3*5), 0*(–2) – (–2)*3> = <10,15,6> = n
The result is shown in Figure 10-32; but n should be normalized, so let’s make it a unit vector. Here’s the final calculation:
* n <10,15,6> <10,15,6>
n = ___ = ________________ = ________ = <10/19, 15/19, 6/19>
|n | (102 + 152 + 62)1/2 19 = <.526, .789, .315>
The original triangle with n* as its normal is also shown in Figure 10-32.
Luckily, that’s all we need to know about vectors to get started on the 3D graphics engine. However, we will pick up little bits here and there as we progress. The next mathematical topic we must discuss is matrices. Maybe it’s time for a colorful metaphor?
Matrices and Matrix Operations
Matrices are mathematical tools that were invented mainly to manipulate systems of linear equations, such as:
- 3x + 2y + 5z = 10
- 10x – 4y + 7z = 20
- 30x + 10y – z = 30
If you recall, to solve the system, we need to find an x, y, and z such that all the equations are simultaneously satisfied. This can be accomplished in myriad ways, including substitutions, adding multiples of one row to another, and so forth. However, when a system of equations has 50 or 100 variables, writing them out can be quite unnerving, and solving them is next to impossible for creatures as frail and slow as humans. Consequently, mathematicians came up with a shorthand form to write systems of linear equations, called matrices. Then an entire area of mathematical study was born, called linear algebra. In any case, we’re only interested in using matrices to simplify our operations rather than to solve systems of linear equations. However, just to be complete, let’s see what the previous system of linear equations looks like in matrix form:
AX=B
That’s pretty simple! But what are A, X, and B? They are matrices formed by taking specific components of the original system of equations. For example, A is called the coefficient matrix and is generated by taking the coefficients of x, y, and z row-by-row and placing them in a matrix like this:
|3 2 5|
A = |10 -4 7|
|30 10 -1|
The variable X is called the variable matrix and is simply the variable involved in the linear system of equations. In our case, X looks like this:
|x|
X = |y|
|z|
Note: It is just a coincidence that one of the variables in the problem is called X.
Finally, the last matrix in the matrix equation is B. This is called the constant matrix and is another single-column matrix, which contains the right side or constants of the problem, such as:
|10|
B = |20|
|30|
Thus, the entire linear system of equations can be written as:
AX=B, which is equivalent to:
|3 2 5 | |x| |10| T=|10 -4 7 | * |y| = |20| |30 10 -1| |z| |30|
Although it looks a little nicer, it doesn’t seem to help the situation at hand, which is the solution of the original set of linear equations. Actually, it does, because if we forget about the matrices for a minute and manipulate the original matrix equation AX=B, then we can solve for X like this:
X = A–1B
This states that the variable matrix X is equal to the product of the inverse of the coefficient matrix and the constant matrix. Exciting, huh! Well, if you’re not that impressed, don’t worry about it. We aren’t interested in finding the solutions to linear systems of equations, but we are interested in using some of the concepts and techniques from matrix mathematics and linear algebra to make our 3D manipulations easier to perform and more efficient. Let’s begin with some definitions, shall we?
A matrix is defined as a collection of numbers, mxn (the x doesn’t mean cross product, it means “by”), where m is the number of rows and n is the number of columns. For example, here are a few matrices of different sizes:
|3 4 5|
A = |26 -1|
|6 0 9|
- 3 rows by 3 columns, 3x3
|4 6|
B = |3 4|
|1 3|
- 3 rows by 2 columns, 3x2
C = |1 4 5|
- 1 row by 3 columns, 1x 3
D = |1|
- row by 1 column, 1x1
As you can see, matrices can be of any size. For a moment let’s not worry about the significance of the numbers in the matrices and learn to manipulate matrices as algebraic entities.
Matrix Addition and Subtraction
You can add and subtract matrices in much the same way you add and subtract numbers and vectors. However, to add or subtract a pair of matrices, they must be the same size, or in other words, they must be the same dimension. For example, none of the preceding matrices A, B, C, nor D can be added together, since all of their dimensions are different. However, if we have two matrices with the same dimensions mxn, where m and n can be any whole number greater than 1, the two matrices can be added.
So, how do we add matrices? Simple—we just add the elements and then place the sums in the proper positions in the result matrix. Take a look at the following example.
Given that,
|2 3| |3 -1|
A = |1 4| B = |5 6|
2x2 2x2
A+B =|2 3| + |3 -1| = |2+3 3+(-1)| = |5 2 |
|1 4| |5 6| |1+5 4+6 | |6 10|
Notice that the result matrix has the same dimensions as both A and B. In C/C++, matrices are nothing more than 2D arrays. For example, we might define a matrix type that is 3x3 with the following code structure:
typedef struct matrix_typ
{
int matrix_1[3][3]; // an integer 3x3 matrix
} matrix, *matrix_ptr;<pre>
Then we can define a couple of matrices like this:
<pre>matrix mat_1 = {3,4,5,
2,6,9,
3,0,-6};
matrix mat_2 = {1,2,0,
2,45,10,
-9,3,4};
matrix sum;
Then if we wanted to write a function to add matrices, something like this would be at the core of it:
for (row=0; row<3; row++)
for (column=0; column<3; column++)
sum[row][column] = mat_1[row][column] + mat_2[row][column];
And that’s all there is to it! Of course, subtraction is performed exactly as addition is except the elements of the second matrix are subtracted from the first rather than added. Now that we can add and subtract, let’s see if we can multiply matrices in some way that makes a bit of sense.
Matrix Multiplying
Unfortunately, matrix multiplying isn’t as simple as multiplying each element of the two matrices together. The algorithm used to perform matrix multiplying has a mathematical basis, which we aren’t going to delve into. We are simply going to learn how to do it since we’ll need it to perform 3D graphics (as we will learn later).
For the matrix multiplication of A*B to make sense, the inner dimension of the two matrices must be the same, or in other words, if A is mxn, then B must be nxr. This means that the number of rows of A can be anything and the number of columns in B can also be anything, but the number of columns in A must be the same as the number of rows in B. This is shown graphically in Figure 10-33. The result of multiplying the matrices A and B will be a matrix that is mxr, which are the outer dimensions of A and B.
So if we wanted to multiply matrix A, which is 3x2, by matrix B, which is 2x4, we could do it. However, if A was 3x2 and B was 3x3, this wouldn’t work since A and B don’t have the same inner dimension.
Now that we know what we can and can’t multiply, let’s see how the actual multiplication is performed. Refer to Figure 10-34 for the following discussion. In general, to multiply a matrix A by B, we must multiply rows of A by columns of B, and the resulting sum of products are used as the elements of the resulting product matrix. Or, to put it more precisely, the (i,j)th element of the result matrix is computed by taking ith row of A and multiplying each element of it with the jth column of B (which is actually nothing more than a dot product).
The result of this multiplication is placed in the (i,j)th position of the result matrix. Therefore, with a bit of thought, one can deduce that multiplying a matrix that is mxn by a matrix that is nxr takes (m*r*n) multiplications and (m*r*(n–1)) additions. Most of the matrices we are going to deal with will be 3x3 and 4x4, so this isn’t too much of a problem. As a quick exercise, figure out how many multiplications and additions are needed to compute the product shown below:
|2 3| |5 6| |(2*5 + 3*3) (2*6 + 3*0)| = |19 12| |1 2| * |3 0| = |(1*5 + 2*3) (1*6 + 2*0)| |11 6|
Answer: Eight multiplications and four additions, which is correct since (m*r*n)=(2*2*2) and (m*r*(n–1))=(2*2*(2–1))=4.
We need to learn matrix multiplication because it allows us to transform 3D points very easily by multiplying them by matrices. Moreover, we will shortly learn that matrices can be concatenated, resulting in a single matrix that can perform scaling, rotation, and translation (among other things) in a single shot! But before we get to all that fun stuff, let’s talk about the identity matrix, since it’s nice to know what the equivalent of 1.0 is in any mathematical space.
The Identity Matrix
The identity matrix is the multiplicative identity of matrices. This means that any matrix multiplied by the identity matrix (if the dimensions are correct) will result in the original matrix. To put it explicitly in matrix notation:
- A*I = A = I*A = A
This states that any matrix A that is multiplied by the identity matrix I is equal to A; moreover, the order of multiplication doesn’t matter.
This brings us to an important point in matrix multiplying: The order of multiplication of two matrices in general is not commutative; in other words, order does matter. Take a look below:
- A*B is not generally equal to B*A
The only reason that A*I=I*A is that the identity matrix is a special matrix, and because of its properties, the order of multiplication is irrelevant. But what does an identity matrix look like? Well, an identity matrix must be nxn or square. Thus, the only matrices that can be multiplied by an identity matrix are square matrices that have the same number of rows and columns.
The second property of an identity matrix is that the main diagonal consists of all 1’s and the remaining elements are 0’s. For example, if we wanted a 3x3 identity matrix, we would construct the following matrix:
|1 0 0|
I_3x3 = |0 1 0|
|0 0 1|
And a 2x2 identity matrix would look like this:
|1 0| I_2x2 = |0 1|
Now let’s see if the identity matrix does what it’s supposed to do when we multiply it by another matrix. Here is the multiplication between a 2x2 matrix and a 2x2 identity matrix:
|a b| |1 0| |a*1+b*0 a*0+b*1| |a b| |c d| * |0 1| = |c*1+d*0 c*0+d*1| = |c d|
As you can see, the result is the original matrix no matter what order we perform the matrix multiplication.
The identity matrix is useful for many operations in matrix math, but in 3D graphics it is commonly used as a blank or starting matrix that can be multiplied with other matrices to arrive at a final transformation matrix. But remember, all identity matrices are square; therefore, it is impossible to compute an identity matrix that is non-square.
Computing the Inverse
In the beginning of the section on matrices we saw a bit of black magick performed with the matrix equation:
- AX=B
The results were a series of manipulations to compute the variable matrix X. Here are the manipulations in more detail:
- A–1AX=A–1B
- IX=A–1B
- X=A–1B
The final result states that the variable matrix, which is the x, y, and z of the linear system of equations, can be found by multiplying the inverse of A by the constant matrix B. However, what exactly is an inverse and what is the importance of inverses? An inverse is a matrix such that when multiplied by the original matrix, results in the identity matrix. For example, in standard real numbers, the inverse of any number can be found by taking one over the number. Or, more precisely, the inverse of i is:
- i–1 = 1/i
Therefore:
- i * i–1 = i * 1/i = 1 = i–1 * i = 1
And similarly to ordinary numbers, matrices also have inverses. However, computing these inverses is beyond the scope of this book for the general case. We will need to compute the inverses of some specific matrices that have to do with graphical transformations such as translation, scaling, and rotation. However, we will cross that bridge when we come to it.
For now, let’s take a look at a simple example that illustrates the multiplication of a matrix A by its own inverse, resulting in the identity matrix. Of course, I am going to pull the inverse out of a hat (since I know how to compute them in general), but don’t tweak too much–just watch how the result of the multiplication is the identity matrix.
Given the matrix,
|2 3| A = |-1 4|
its inverse is equal to:
|4/12 -3/12| A^(-1) = |1/12 2/12|
And if we multiply A*A–1, then the result is:
|2 3| |4/12 –3/12| |(8/12+3/12) (-6/12+6/12)| |1 0| A*A^(-1) =|-1 4| * |1/12 2/12 | = |(-4/12+4/12 (3/12+8/12)| = |0 1| = 1
Of course, we just as well could have multiplied A–1*A, and the result also would have been the identity matrix I. Concatenation The final topic we are going to touch upon is one of the coolest things about matrices, and that’s the property of concatenation. Concatenation means that a single matrix can be generated by multiplying a set of matrices together:
- C = A1*A2*A3*...*An
The interesting thing is that now the single matrix C holds all of the transformation information that the other matrices contained. This is a very useful tool when performing multiple transformations to all the points that make up the objects of the game. We can use the single concatenated matrix, which is the product of all of the desired transformations, instead of multiplying each point by two or three different matrices that each perform a single operation such as translation, scaling, and rotation.
Using this technique speeds up the game calculations by about 300 percent! So this is a good thing to know. However, as we learned earlier, the order of multiplication is very important. When we concatenate matrices, we must multiply them in the order in which we wish them to operate, so make sure that translation is performed first or rotation is done last (or whatever).
Fixed Point Mathematics
The last purely mathematical topic we must discuss is fixed point mathematics. If you know anything about 3D graphics, you’re sure to know something about fixed point math. Fixed point math is a form of mathematics that is based on the use of integers to approximate decimals and fractions. Normally, we do calculations using floating point numbers. However, there is a problem with floating point numbers and microprocessors–they don’t get along. Floating point math is very complex and slow, mainly because of the format of a floating point number as it is represented in the computer.
Floating point numbers use a format called IEEE, which is based on representing any number by its mantissa and exponent. For example, the number 432324.23123 in floating point is represented like this,
- 4.3232423123E+5
which might be more familiar as:
- 4.3232423123 x 105
In either case, the IEEE format stores the mantissa and the exponent in a way that is a computational hazard to say the least. The results of this format are that multiplication and division are slow, but addition and subtraction are really slow! Doesn’t that seem a bit backward? Fortunately, as time has marched on, most CPUs, including the 486DX and 586 Pentium, have on-chip math coprocessors. However, only the latest Pentium processor has floating point performance that exceeds integer performance (sometimes). Thus, we have to make a decision–whether or not to assume that everybody has a Pentium. That would be a mistake, since it will be two to three years before the majority of people have converted. Hence, we must assume that most users have a 486DX (even that is pushing it).
The floating point performance of a 486DX is good, but it’s still not as good as integer performance (especially for addition and subtraction). Moreover, since graphics are performed on an integral pixel matrix, at some point, floating point numbers must be converted to integers, and this is a real performance hit. The moral to the story is that we are going to avoid math as much as possible, but we are going to use floating point numbers for the majority of our calculations during the prototyping phase of our graphics engine. Then when we are ready to make the engine really rock, we will convert all the floating point math to fixed point math to get the best possible speed.
Another reason to use floating point math during the prototype phase of our 3D graphics engine is that the graphics are hard enough to understand without a ton of confusing fixed point calculations getting in the way of the purity of everything. And many people might just want to keep the floating point library and forget the final fixed point version, so this gives everyone the best options.
So what exactly is fixed point math? Fixed point math is a form of computer math that uses an integer to represent both the whole part and fractional part of a decimal number by means of scaling. The main premise is that all numbers are represented in binary; therefore, if we use a large integer representation such as a LONG, which is 32 bits, we can play a little game. The game has the rules that the whole part of a number will be placed in the upper 16 bits and the decimal part will be placed in the lower 16 bits. Figure 10-35 shows this breakdown. Although we can place the “fixed point” anywhere, most implementations place it smack in the middle for simplicity.
The lower 16 bits are called the decimal part and the upper 16 bits are called the whole part. All we need is to come up with a set of rules so that we can assign numbers to fixed point numbers. Well, it turns out to be quite easy. First, let’s define a fixed point type:
typedef long fixedpoint;
We know that the whole part is supposed to be in the upper 16 bits and the decimal part in the lower 16 bits; therefore, if we want to convert the simple integer 10 into a fixed point number, we can do something like this:
fixedpoint f1; f1 = (fixedpoint)(10 << 16);
What’s happening is, we are scaling or shifting the 10 by 16 bits (which is equivalent to multiplying by 65536) to place it into the upper half of the fixed point number where it belongs. But how can we place a decimal number into a fixed point number? That’s easy–instead of shifting the number, we must multiply it by 65536, like this:
f1 = (fixedpoint)(34.23432 * 65536);
The reason for this is that the compiler represents all decimal numbers as floating point numbers and shifting doesn’t make sense, but multiplying does. Now that we can assign a fixed point number with a standard integer or a decimal number, how do we perform addition, subtraction, multiplication, and division? Addition and subtraction are simple. Given two fixed point numbers f1 and f2, to add them we use the following code:
f3 =f1+f2;
And to subtract them we use:
f3 = f1-f2;
Isn’t that easy! The problem occurs when we try to multiply or divide fixed point numbers. Let’s begin with multiplication. When two fixed point numbers are multiplied, there is an overscaling because each fixed point number is really multiplied internally by 65536 (in our format); therefore, when we try to multiply two fixed point numbers, like this,
f3 = f1*f2;
the result is overscaled. To solve the problem, we can do one of two things. We can shift the result back to the right by 16 bits after the multiplication, or shift one of the multiplicands to the right by 16 bits before the multiplication. This will result in an answer that is scaled correctly. So, we could do this
f3 = (f1*f2)>>16;
or this:
f3 = ((f1>>16) * f2);
However, now we have a problem. Take a look at Figure 10-36. If we shift the results after the multiplication, it’s possible that we may get an overflow since, in the worst case, two 32-bit numbers can result in a 64-bit product (actually, in this particular 16.16 representation, whenever numbers greater than 1.0 are multiplied there will be an overflow). On the other hand, if we shift one of the numbers to the right before the multiplication, we diminish our accuracy since we are in essence throwing away the decimal portion of one of the numbers. This can be remedied a bit by shifting each number by 8 bits and symmetrically decreasing their accuracy. However, the best solution to the problem is to use 64-bit math.
Using 64-bit math is possible, but it isn’t easy to do with a 16-bit C/C++ compiler. Even with the in-line assembler that most C/C++ compilers come with, performing 64-bit math may not be possible if the compiler doesn’t support 32-bit instructions. But we will deal with all these problems later. If worse comes to worst, we can always link in external assembly language functions that perform full 64-bit integer fixed point mathematics. Don’t miss the little hidden lesson here, which is this: fixed point math performed with 32-bit numbers on a 16-bit compiler is slower than floating point on 486DXs and Pentiums! This is because the compiler turns simple operations like
long a,b; a=a+b;
into multiple 16-bit operations, killing the speed gained from the whole fixed point format.
Finally, division is performed much the same way as multiplication, but the numbers after division are underscaled! This occurs since there is a factor of 65536 in each dividend and divisor, and this factor will be divided out during the division. Thus, before the division is performed, the dividend must be scaled by another 16 bits–that is shifted to the left, for example
f3 = ((f1 <<16) / f2);
Once again this poses a problem, since during the shift to the left of the dividend, the 32-bit fixed point number may overflow, as shown in Figure 10-37. We can come up with a “best of the worst” solution by scaling each number up 8 bits and then hoping for the best, but the best way to solve the problem is to use 64-bit math again. Therefore, our final library, to be totally accurate, must use 64-bit fixed point math to hold the overflows and underflows. However, if all the math we perform can’t possibly overflow or underflow in 32 bits, we can get away with 32-bit math. These are decisions that we will make as we design the 3D graphics engine.
Finally, to convert a fixed point number back to an integer, all we need to do is shift it back to the right 16 bits, as in:
int number; number = (f1>>16);
Making the Quantum Leap to 3D
Now that we have laid the mathematical foundation necessary to discuss 3D graphics, let’s begin by realizing that 3D graphics is not hard, not beyond comprehension. However, there are many details that we have to consider. As long as we take it slow and understand all the material and concepts, we’ll have no trouble at all. The remaining material in this chapter is going to be an overall coverage of 3D graphics. Rather than describing any single topic in detail, we are going to see how it all fits together.
Understanding the interrelationships in a 3D graphics system is probably the most important thing. Once we have a top-down view of all the workings of a 3D graphics engine, we can break each system down and implement it. Learning each system one by one without knowing how they all fit together can be frustrating. Therefore, think of the rest of this material as an outline for the following chapters, which will focus on each separate aspect.
Points, Lines, and Polygons in 3D
As we know, all objects in 3D space can be defined by a collection of points. Points are in fact the basic building block of 3D objects. Some 3D graphics systems like to think of objects as sets of lines and some like to think of objects as sets of polygons. We are going to take the latter point of view. Thus, a 3D object in our graphics engine is going to be nothing more than a set of polygons defined by a set of points. Take a look at Figure 10-38. The figure depicts a cube made of a set of points, and each face of the cube is made up of a specific set of points.
So a polygon is a set of points in 3D space. This is true, but we need to add one more premise, and that is that the points are coplanar. This means that a plane exists containing all the points. Otherwise, the polygon isn’t flat. Take a look at Figure 10-39 to see this. The left side of the figure shows a polygon that is coplanar, and the right side of the figure shows a polygon that isn’t coplanar. It’s also interesting to notice that all triangles are coplanar. Or in other words, all polygons with exactly three vertices, by definition, lie in a single plane, since three points is the absolute minimum number of points needed to describe a plane in 3D.
We know that our graphics engine is going to use polygons. But are we going to allow polygons of any shape? The answer is no! We are only going to support triangles and quadrilaterals (four-sided polygons–we’ll call them quads). The reason for this is that triangles are the easiest and fastest objects to draw and are very easy to model with. Quads, on the other hand, aren’t as easy to draw, but are nice for modeling. Luckily, we can always convert a quad into a pair of triangles by a method called triangulation. Then we can render each of the triangles separately (see Figure 10-40). In fact, any polygon can be triangulated, but let’s keep things simple!
Data Structures
Let’s talk a bit about data structures now. How should we represent the objects in our game world? Well, this is actually a very important detail, and we’re going to evolve it as we go; but for now, let’s rough out a possible data structure graphically. Figure 10-41 shows the objects in a game universe. Each object consists of a set of polygons, which themselves consist of a set of points.
Since many polygons in an object will have points in common, it would be inefficient to have multiple copies of vertices, so each polygon definition will be based on a common set of vertices. Thus, Figure 10-42 is a more realistic picture of what we are going to end up with. The end result is still going to have the single point as the basic building block. This means that all of our functions are at some point going to transform to each point that makes up an object. Hence, we are going to concentrate on learning how to transform single points and then use that technology to transform entire objects, and in the end, the whole universe. With all that in mind, let’s make up some data structures so we have something to reference in the remaining examples of the chapter. These data structures aren’t going to be the final ones we use, but just something that we can plug into any code fragments that we might want to write. First is the single point or vertex:
typedef struct point_typ
{
float x,y,z; // notice they are floating point values
} point, *point_ptr, vertex, *vertex_ptr;
Now let’s define a polygon as a list of vertices:
typedef struct polygon_typ
{
int num_vertices; // number of vertices in this polygon (3
// or 4 in our case)
int vertex_list[4]; // the list of vertices, 4 or less
} polygon, *polygon_ptr;
Finally, let’s define an object as a collection of polygon faces made of vertices:
typedef struct object_typ
{
int num_vertices; // total number of vertices
// that make up object
vertex vertices[MAX_VERTICES]; // the vertices of the object
int num_polygons; // the total number of poly-
// gons that make up the object
polygon faces[MAX_FACES]; // the faces of the object
float world_x,world_y,world_z; // the position of the object
// in the universe
} object, *object_ptr;
You will notice that I have used static arrays to hold the vertices and polygons. A more elegant and memory-efficient solution would be to use linked lists, but for illustrative purposes, arrays are easier to manipulate. However, I doubt that any object will have more than 16 or 32 vertices and a dozen or so faces–otherwise, our graphics engine is going to have a hard time running in real-time!
Coordinate Systems
When we worked with 2D graphics and sprites in the previous chapter, we had two different coordinate systems. Each sprite had its own (x,y) location, and the sprite image was defined relative to this (x,y) location as a rectangle that extended some width and some height from the (x,y) position of the sprite. This can be thought of as the sprite’s local coordinate system. Then when we drew the sprite on the screen, we used its (x,y) location as a position relative to the upper corner of the screen (0,0) to position and draw the sprite. This is the coordinate system we refer to as screen coordinates.
Three-dimensional computer graphics also has multiple coordinate systems. Many coordinate systems are necessary because each object must be defined relative to its own local origin, as shown in Figure 10-43. Then each object needs to be placed somewhere in the world relative to a universal origin (see Figure 10-44), and finally, the objects have to be drawn or projected onto the screen. Let’s take a look at all these systems in detail.
Local Coordinates
All the objects in our 3D graphics engine are going to be defined as a set of vertices or points that are relative to a local origin of (0,0,0). This means that when we model our objects, we are always going to assume that they are centered about the origin. However, during game play, they are of course going to move around, but the vertices of the object won’t be altered such that they are adjusted to move the object to another location. The vertices will be translated and projected only during the rendering process. Take a look at the cube in Figure 10-45 to see this graphically.
The figure shows the cube moving 20 units in the positive X direction. However, this is not what happens in the data structures. Within the data structures, all we will update is the position of the cube or its world_x, world_y, and world_z. Just as we translated sprites by updating their (x,y) position in the structure, we are going to manipulate 3D objects by moving their position instead of manipulating their vertices. One of the reasons for this is that once an object has been moved, its vertices are changed and its center is no longer (0,0,0). This creates a problem with both scaling and rotation; since the new center of the object isn’t (0,0,0), the object will scale and rotate incorrectly.
Summing up, each object in our 3D database will be defined relative to the point (0,0,0). Then when we want to move an object, we are only going to update its world position, not its vertices. This keeps the object centered at (0,0,0) mathematically as far as the vertices are concerned and allows us to scale and rotate without problems.
World Coordinates
The world coordinate system is like the universe. It has a center (0,0,0) and extends out to infinity in all the directions X, Y, and Z, as shown in Figure 10-46. In a 3D game, we are going to move around and place game objects in this world coordinate system. Now in a real game, the world coordinate system may have finite limitations due to the underlying implementation. For example, our worlds may only be 32,000x32,000x32,000, as shown in Figure 10-47. But whatever size the world coordinate system is, all the objects are contained within it and manipulated within it.
For example, imagine that we have defined an object named jet_fighter. The object might look like Figure 10-48. We could place the object in the world coordinate system by setting the world_x, world_y, and world_z components of the object structure like this:
world_x = 1000; world_y = 50; world_z = -10000;
The jet would then be placed in the world as shown in Figure 10-49. As you can see, because the scale of jet is so small to the size of the world coordinate system, it is nothing more than a dot!
Let’s recapitulate the relationship between local coordinates and world coordinates. Local coordinates are used to define an object with a set of vertices that are relative to (0,0,0). This allows rotation and scaling to be performed properly. The position of the object in world coordinates is held in a set of variables called world_x, world_y, and world_z. These coordinates define where the center of the object should be placed in the game universe. For example, if an object was a cube defined by 8 vertices, the local vertex coordinates are added to the world_x, world_y, and world_z position coordinates to position the object for rendering. However, that is the extent of the contact between local coordinates and world coordinates. The vertices that make up an object are only temporarily modified by the position of the object during rendering, otherwise, they stay the same unless the object is scaled or rotated.
Camera Coordinates
The final coordinate system that we need to consider is called the camera coordinate system, which is really the view that the user sees. Take a look at Figure 10-50. Here we see a game universe with the user looking in a certain direction. Based on the user’s field of view, he should see a specific amount of imagery within the 3D universe. Therefore, the camera coordinate system needs to know two things: the position of the viewpoint and the view direction in which the user is looking.
The viewpoint is easy enough to understand. It is simply a point in world coordinates at which the user or player is positioned. The view direction is a bit harder to grasp. Let’s give it a try. Given that the user is positioned at a view position, imagine that position has a local coordinate system mapped on top of it, as shown in Figure 10-51. Thus, relative to the world coordinate system, the view direction is made up of the angles relative to the world axes that the user is pointing toward. Or to put it another way, imagine that the user is standing at the origin of the world coordinate system, as shown in Figure 10-52, so that the viewpoint is (0,0,0). Further imagine that the user is looking straight down the positive Z axis. We might call this “the neutral position.” Then when the user turns parallel to either the X, Y, or Z axis, we record these changes as a set of angles: angle_x, angle_y, and angle_z.
These angles are in essence the pitch, yaw, and roll of the player. They are used along with the view position to compute the final image viewed by the user in Cyberspace. For example, imagine that a universe has the objects in it as shown in Figure 10-53 (which is an X-Z top-down view of a battleground). You will note that the player has angle_y of 180 degrees, that is, he is looking down the negative Z axis. Thus, all the objects shouldn’t be visible since the player is looking the other way. This is the whole idea of the camera coordinate system. Think of the player’s position and viewing angle literally as a camera that is the window on the scene. The trick is to render the scene that the camera is looking at on the screen. This is called the viewing transformation and projection. One of the first steps in projection is to translate all the objects relative to the viewpoint and then rotate them relative to the view direction. What’s visible is what’s drawn. This may seem a bit confusing, but hang in there. We’ll talk more about projection and the viewing transformation a bit later, but now let’s see how to transform 3D objects.
Transformations in 3D
Earlier we saw how to manipulate 2D points with simple equations. We learned how to translate, rotate, and even scale objects in two dimensions. All these techniquescan be accomplished in 3D in much the same way. We simply need to add one more coordinate and a few more equations. However, instead of using explicit equations to perform the manipulations, we are going to use matrices. That is the whole reason we studied them in the previous pages. Three-dimensional transformations can be quite long, and it’s nice to be able to write them in a short form and perform them efficiently–hence, the use of matrices.
This brings us to the question, how are we going to do this? Well, we’re going to pretend that all the points in our 3D world are really little matrices that each consist of a single row, such as:
- [x y x]
Then we are going to multiply the points by transformation matrices that are of the correct size and contain the proper elements to perform the required transformation. However, we are going to have to add a little bit to our definition of a point. All points must be defined not by three components but by four. This fourth component is called the normalizing factor. The normalizing factor is more of a mathematical need than an implementational need. By adding a fourth coordinate, the 3D coordinates become homogeneous, which simply means they are scaled relative to the normalizing factor. The only reason we need to use homogeneous coordinates is that they allow us to perform translation with a matrix. You see, to translate a 3D point, a 4x4 matrix is needed. Try to perform translation with a 3x3 matrix and you’ll see what I mean. Anyway, homogeneous coordinates look like this:
- [xw yw zw w]
The fourth component is the normalizing factor. Thus, if we want to recover the original point (x,y,z), we perform the following transformation:
- x=x/w;
- y=y/w;
- z=z/w;
Doing three divisions to extract a point that we already have isn’t too smart, so we are going to let w=1; therefore, our new homogeneous coordinates look like this:
- [x y z 1]
We need not perform any division to extract the original point. The whole idea of all this is to be able to multiply our original homogeneous point by a transformation matrix that results in a new transformed point. Therefore, we want to multiply a 1x4 matrix by some nxm matrix that results in another 1x4 matrix. Therefore, all the transformation matrices must be 4x4 because 1x4*4x4=4x4. So let’s start by learning to translate a single point.
Translation
To translate a single point (x,y,z), we need three translation factors tx, ty, and tz. The operation we wish to perform is shown in Figure 10-54. The point (x,y,z) is moved to a location (x+tx,y+ty,z+tz). Therefore, translation is simply,
- x=x+tx;
- y=y+ty;
- z=z+tz;
and we must convert this into matrix form using homogeneous coordinates. OK, here’s how we do it. First, let’s represent our input point by the matrix:
- [x y z 1]
Then we need to come up with a 4x4 transformation matrix that results in
- [(x+tx) (y+ty) (z+tz) 1]
when multiplied by the input point. With a little thought and a lot of experience in matrix multiplication, you could deduce that the matrix we need looks like this:
| 1 0 0 0| T= | 0 1 0 0| | 0 0 1 0| |tx ty tz 1|
Therefore, to translate a point, the following multiplication is used, and the end result is what we desired.
|1 0 0 0|
[x y z 1] * |0 1 0 0|
|0 0 1 0|
|tx ty tz 1|
Scaling
Scaling is the next operation that we may want to perform on a 3D object. Given that an object is defined with a local origin of (0,0,0), we want to derive a scaling matrix such that the object will scale relative to (0,0,0) without translation. Such an operation is shown in Figure 10-55. The object (a cube) is scaled in the X, Y, and Z directions by the scaling factors sx, sy, and sz. Note that objects need not be uniformly scaled in all directions. Anyway, we want to perform the following operation:
- x=x*sx;
- y=y*sy;
- z=z*sz;
So we need a 4x4 transformation matrix that will do this without adding or subtracting any other factors to the original point. See if you can come up with such a matrix without looking below.
We need to find a 4x4 matrix such that the homogeneous point,
[x y z 1]
when multiplied by some 4x4 matrix, results in the new transformed point of:
[x*sx y*sy z*sz 1]
Do you give up? Here’s the answer:
|sx 0 0 0|
S = | 0 sy 0 0|
| 0 0 sz 0|
| 0 0 0 1|
Using the above matrix and the input point, the following multiplication will perform the scale, which gives the correct result.
|sx 0 0 0|
[x y z 1]* | 0 sy 0 0|= |x*sx y*sy z*sz 1|
| 0 0 sz 0|
| 0 0 0 1|
Rotation
Unlike rotation in 2D, rotation in 3D has a bit more variety since we are no longer trapped in a single plane. On the contrary, we can rotate in three different planes. Figure 10-56 shows the possible directions of rotation for a left-handed system. As you can see, rotation can be parallel to the Z axis (similar to the 2D rotation) or it can be parallel to either the X or Y axis. Actually, it can be around all of them simultaneously if we really want to get crazy. But for now, we’ll stick to one axis at a time.
The first detail that we must note is that the rotation equations (unlike the translation and scaling equations) work differently on right-handed and left-handed systems. This is because the positive Z axis is in the opposite direction between the right- and left-handed systems. A rotation in one direction in the right-handed system will result in the same rotation but in the other direction in the left-handed system. Take a look at Figure 10-57. This isn’t a big deal, but it’s nice to know when you are trying to rotate something and it keeps turning the wrong way.
With that in mind, we are going to derive the rotation matrices for the left-handed system. The right-handed derivation is exactly the same except for a sign change here and there. Let’s begin with rotation parallel to the Z axis similar to the flat 2D rotation. The rotation equations we derived for 2D can be used. They state that given a point (x,y), to rotate the point by an angle theta, the following formulas can be used:
- x′ = x*(COS θ) – y*(SIN θ)
- y′ = x*(SIN θ) + y*(COS θ)
Using the above information and noting that rotation parallel to the Z axis doesn’t affect the Z axis, we can add the following equation to the transformation:
- z′ = z
Therefore, we need to derive a matrix that will perform the overall transformation:
- x′ = x*(COS θ) – y*(SIN θ)
- y′ = x*(SIN θ) + y*(COS θ)
- z′ = z
The desired matrix is shown here:
| COS θ SIN θ 0 0|
Rz = | -SIN θ COS θ 0 0|
| 0 0 1 0|
| 0 0 0 1|
Let’s try it out with the standard input point and an angle theta and see if it works:
| COS θ SIN θ 0 0|
[x y z 1] * | -SIN θ COS θ 0 0|
| 0 0 1 0|
| 0 0 0 1|
= [(x*COS θ – y*SIN θ) (x*SIN θ +y*COS θ) z 1]
If you stare at it a minute, you will see this is the result we wanted. Rotations parallel to the X and Y axes are similarly derived by swapping variables. Here are the rotation matrices:
|1 0 0 0|
|0 COS θ SIN θ 0|
Rx = |0 -SIN θ COS θ 0|
|0 0 0 1|
And the Y-axis rotation matrix is:
|COS θ 0 -SIN θ 0|
Ry = | 0 0 0 0|
|SIN θ 0 COS θ 0|
|0 0 0 1|
I hope you have noticed that all these matrices are rather sparse—that is, they are mostly 0’s. We can take advantage of this since there is no reason to multiply a number by 0 if we know the result will be 0! This is actually one of the optimizations we’ll implement in the final 3D engine.
Projecting the Image
Once we have defined a set of objects and the user is positioned at a given viewpoint with a specific view direction, the image must be rendered on the screen. But before we can render (draw) the objects, we must project them from 3D to 2D. This is called projection. But before the projection can be accomplished, the objects must be transformed by the world to camera transformation, which is done by translating each object to the viewpoint and rotating each object by the inverse of the view direction. This is done to normalize the projection so that it is always in relation to the X-Y plane, which is parallel to the video screen with the positive Z axis pointing into the screen.
We’ll talk more about the viewing transformation that brings all the objects into the proper position, but for now, let’s assume that the objects are ready to be displayed and we want to draw them on the screen. The question is, how do we do this? As an example, let’s take the simple case shown in Figure 10-58. Here we see the user looking at a simple cube, and we would like to project this image on the screen. Since the screen is only 2D, we must somehow take into consideration all the Z components of the object before we map the object onto the 2D video screen. This “consideration” is called projection and is used to display a reasonable image of a 3D object onto a 2D plane. Let’s see what techniques we can propose to do this.
Parallel Projections
The simplest technique used to project a 3D image on a 2D plane is to throw away the z component of each vertex. This is called a parallel projection. The following transformation will accomplish this.
Given the point (x,y,z) in 3D:
x_par = x; y_par = y;
Of course, we might need to center the image with translation factors and scale one of the axes to take into consideration aspect ratio, so the actual transformation will end up looking like this:
x_screen = x_par + SCREEN_WIDTH/2; y_screen = y_par*aspect_ratio + SCREEN_HEIGHT/2;
In either case, all lines are projecting in parallel. This means that no matter how far away an object is from the view screen, it will always be projected as the same size. Although parallel projections are very useful in mechanical imaging and so forth, for a video game, it’s not going to cut it. The whole idea of a 3D video game is the 3D part! Thus we need a projection that takes into consideration the z component and scales the x and y in some manner to make the object look bigger as it gets closer and smaller as it moves farther away.
Perspective Projections
Perspective projection is the ultimate in realism that can be achieved with a 2D video screen. Perspective projection is based on the premise that light rays converge to the user’s viewpoint. This is shown in Figure 10-59. Basically, we use the user’s viewpoint as a projection reference and then compute where a straight line (a ray) would intersect the view plane (video screen) if it were projected to the user’s viewpoint. This type of projection has the effect of scaling objects down as they move away and scaling them up as they get closer to the user’s viewpoint. But what is the transformation that will accomplish this? With a bit of geometry, it can be deduced that the perspective screen coordinates (x_per,y_per) are simply:
x_per = x*viewing_distance/z; y_per = y*viewing_distance/z;
where (x,y,z) is the point being projected. Again, (x_per,y_per) should be taken with a grain of salt since they need to be slightly modified to be used as physical screen coordinates. But once we have them, we use the same technique as we did in the parallel projection. Thus:
x_screen = x_per + SCREEN_WIDTH/2; y_screen = y_per*aspect_ratio + SCREEN_HEIGHT/2;
Hence, we see that the final physical screen transformation is invariant of the type of projection, which is as we would expect since the physical screen coordinate system doesn’t care which projection was used to arrive at the screen coordinates.
Since we are using matrices to do everything, let’s create a perspective transformation matrix with which we can multiply a point (x,y,z) with which to arrive at the point (x_per, y_per). The transformation matrix needs only one value in it and that is the position of the view plane, which is really the viewing_distance. Here is the perspective transformation matrix,
|1 0 0 0|
T_per = |0 1 0 0|
|0 0 1 1d|
|0 0 0 0|
where d is the viewing_distance or the positive z position of the viewing plane. As an example, let’s project the point (x,y,z) using the perspective transformation matrix:
|1 0 0 0|
[x y z 1] * |0 1 0 0|
|0 0 1 1d|
|0 0 0 0|
= [x y z z/d]</pre>
We must divide by the normalizing factor since it is not 1; thus, the final result is
- = [x*d/z y*d/z d 1]
Then we throw away the third and fourth components, leaving us with the projected point (x*d/z,y*d/z), which is what we wanted. Yuk! Using a matrix to do this is a bit of overkill since it would be a lot easier just to divide x and y by z and then multiply by the viewing distance. This is what we are actually going to do in the engine, but it’s important to know how to do it with matrices since they are more general.
Clipping the Image
The next part of the viewing transformation is called clipping. Clipping can be done before or after the perspective transformation depending on what kind of clipping is performed. But in general, we don’t want to project any objects that lie outside the viewing volume of the player. This viewing volume is shown in Figure 10-60. It consists of six sides. The top and bottom of the viewing volume are called the hither and yon or near and far clipping planes. The remaining sides are just called sides! The idea of clipping is to remove components of the final 3D image that can’t possibly be seen before they are projected.
As an example, take a look at Figure 10-61, which shows the viewing volume along with an object that is totally outside of it. This object can’t possibly be visible; therefore, there is no reason to project and render it. In fact, trying to project it might result in a floating point error! The second (more common) clipping case is also shown in Figure 10-61. Here we have an object that is partially clipped. This is the hard case. There are a number of ways to approach this. First, we might try to partition the polygons into sets that are totally invisible, partially clipped, and totally visible. Then the polygons that are totally invisible are thrown away and only the partially visible and totally visible polygons are rendered.
Normally, the partially clipped polygons would be clipped to the viewing volume and a new polygon might be generated. However, with the system we are going to use, this isn’t going to be efficient since the only polygons we are going to support are triangles and quads. Thus during clipping, we may end up with a fivesided polygon, which won’t work. Hence, once we have clipped objects that aren’t visible at all, we will project all the polygons that are partially or totally visible, and while we are drawing them on the screen, we will clip them in 2D against the screen boundaries. This is called image space clipping, and the clipping done on the object level is called object space clipping.
The final plan of attack is listed below in steps:
- Determine which objects are totally invisible and throw them away.
- Break up objects that are partially visible into polygons.
- Project polygons that are partially visible or totally visible on the screen and throw away the invisible polygons.
- As the polygons are being scan-converted (drawn on the screen line-by-line), clip them using a pixel-based image space clipper that operates in 2D.
Now that we have the gist of the overall process, let’s see exactly how object space and image space clipping are performed.
Object Space Clipping
Object space clipping is a bit of an unfortunate name because it really has nothing to do with objects. Instead, object space clipping means to perform clipping in a pure mathematical space without concern for physical rendering. For example, imagine that we have translated and rotated all the objects in the universe so that from the user’s viewpoint it is as if he is located at the projection point or viewpoint looking down the positive Z axis into the view plane, as shown in Figure 10-62.
Object space clipping is basically a method that filters out polygons and/or objects that lie outside the viewing volume. This is achieved mathematically by comparing the points that make up objects against the six clipping planes defined by the viewing volume of Figure 10-62. The easiest of the calculations are the tests for the yon and hither clipping planes. These tests are performed by examining the z coordinates of each object, polygon, or point depending on the level at which the clipping is performed.
For example, Figure 10-63 shows a top view of the clipping volume—that is, a view looking down the Y axis onto the X-Z plane. Anyway, the figure depicts a pair of objects. One of the objects is within the Z extents of the clipping volume and the other is not, and thus would summarily be discarded. However, the object that is within the Z extents of the clipping volume might possibly be outside the X and Y extents of the clipping volume. Determining this is the main difficulty with object space clipping.
The solution to the problem can be derived by looking at the geometry of the viewing volume one plane at a time and then using the results to solve both the X-Y and X-Z planar cases (since the Z clipping is trivial). The main reason for the difficulty is that unlike the hither and yon Z clipping planes, the other four sides of the clipping volume are not parallel to the view plane. In fact, they are at angles and are general planes. To solve this, we need to come up with a test function that for any given z distance from the view plane, the function will determine whether or not the point (x,y,z) lies within the clipping volume. Referring to Figure 10-63 and using similar triangles, we can deduce that:
x SCREEN_WIDTH ____ = __________________ z 2 * viewing_distance
Therefore:
SCREEN_WIDTH*z
x = __________________
2 * viewing_distance
And if we think of x and z as the x and z components of a general point (x,y,z), then the point is within the X-Z clipping region if and only if
SCREEN_WIDTH*z
x <= _________________
2 * viewing_distance
and
– SCREEN_WIDTH*z
x >= ___________________
2 * viewing_distance
That takes care of the two clipping planes that bound the clipping region on the right and left. The top and bottom plane testing equations are computed in the same way with similar triangles, and the results are
SCREEN_WIDTH*z
y <= ____________________
2 * viewing_distance
and
– SCREEN_WIDTH*z
y >= ____________________
2 * viewing_distance
Notice the only differences between the x and y equations is that the constant in the numerator has to do with the height of the viewing plane instead of its width. Using the above four equations along with the trivial z equations,
- z >hither_plane
and,
- z <yon_plane
we can come up with a simple algorithm that clips objects, polygons, or points in 3D. The algorithm can use either positive logic or negative logic. This means it can either test if the point is within the clipping region, or it can test if the point is outside the clipping region. At this point, it doesn’t really matter which way we do it, but later when we optimize the engine, it might. Basically in computer programs, it’s sometimes more efficient to test if something is false rather than if it’s true. Anyway, the entire object space clipping algorithm is just a long conditional statement based on the formulas we have derived. Figure 10-64 shows this process as a pipeline.
Image Space Clipping
Although object space clipping is elegant mathematically, sometimes (especially in the realm of video games) it’s more efficient to use image space clipping. Image space clipping is a form of clipping that disregards the 3D geometrical aspects of an object and simply looks at the 2D rasterization of the object as it appears on the display device (which in our case is the video screen).
For example, assume that object space clipping is not performed at all. Then polygons that are really rows of pixels (see Figure 10-65) might be passed to the rendering engine that can’t possibly be drawn on the finite extents of mode 13h. So an image space clipper would test each pixel to see if it is within the screen bounds. For example, Listing 10-1 shows our function Write_Pixel( ), which we wrote many chapters ago.
LISTING 10-1 The old Write_Pixel() function
void Write_Pixel(int x,int y,int color)
{
// plots the pixel in the desired color a little quicker using binary shifting
// to accomplish the multiplications
// use the fact that 320*y = 256*y + 64*y = y<<8 + y<<6
video_buffer[((y<<8) + (y<<6)) + x] = (unsigned char )color;
} // end Write_Pixel
Notice that the function blindly computes the offset into memory and plots the pixel. But what if the sent parameters don’t make sense? What if x and y are beyond the mode 13h screen bounds of 320x200? This is what image space clipping takes care of. Image space clipping is accomplished by putting the clipping code into the actual rendering function (or maybe in a filter in front of it that only passes valid coordinates to the final pixel plotting code). For example, if we want to keep the Write_Pixel( ) function and add an image space filter, we can do something like what’s shown in Listing 10-2.
LISTING 10-2 An image space pixel clipping filter
void Write_Pixel_Clip(int x,int y, int color)
{
// test if pixel coordinates are within range of mode 13h
if (x>=0 && x<319 && y>=0 && y<=199)
Write_Pixel(x,y,color);
} // end Write_Pixel_Clip
So instead of an application calling Write_Pixel( ), all the calls are changed to Write_Pixel_Clip( ), and the filter function only sends valid pixel coordinates to the Write_Pixel( ) function. On the other hand, we might want to make a slightly faster version by getting rid of the second function call, as shown in Listing 10-3.
LISTING 10-3 A second version of the image space clipping pixel plotter with the plot in-line
void Write_Pixel_Clip(int x,int y, int color)
{
// test if pixel coordinates are within range of mode 13h and if so plot the pixel
if (x>=0 && x<319 && y>=0 && y<=199)
video_buffer[((y<<8) + (y<<6)) + x] = (unsigned char )color;
} // end Write_Pixel_Clip
This version of the pixel plotter doesn’t make a separate call to the Write_Pixel( ) function since the code for it is in-line. Using image space clipping for everything is usually too slow for software implementations, although many hardware graphics engines use this method. We are going to use a hybrid of both object space and image space clipping for our systems. We are going to use object space clipping for objects and polygons. Then, once we have surmised that a polygon is visible or partially visible, we are going to render it line-by-line using an image space clipper to draw only the visible portions of it.
Hidden Surface Removal and Rendering
Hidden surface removal is probably the most complex aspect of 3D graphics and is the number one bottleneck that slows down 3D games. The problem is, when the player is positioned at some viewpoint with a given view direction, many of the objects will be clipped, but the ones that aren’t must be drawn in such a way that the order of polygons is correct. Moreover, since most objects in a 3D game are solid, many of the back-facing polygons shouldn’t be visible to the player and hence need not be rendered. This brings us to the concept of hidden surface removal.
Hidden surface removal is a bit of a fuzzy area and is sometimes mixed up with rendering. However, since we are Cybersorcerers, we aren’t too interested in a bunch of critical definitions, we are simply interested in the facts! And the fact is that hidden surface removal ultimately means drawing 3D objects in the right order. And to do this, there are two distinct phases. The first phase is called backface culling, which basically removes polygons that are facing away from the player’s viewpoint and need not be processed.
This might be a bit confusing, so let’s use Figure 10-66 as a reference to clear it up. Figure 10-66 shows a solid cube with the player’s viewpoint situated on one side of the cube looking at it. From the player’s viewpoint, the far side of the cube is totally invisible; thus, we shouldn’t process those faces since they can’t possibly be seen. Therefore, the back-face culling process should throw these faces away before further useless computations are performed on them.
But how is back-face removal done? We do it with a bit of vector math. Again referring to Figure 10-66, we see a line-of-sight vector v along with a set of normal vectors to some of the faces of the cube. The trick is to take the dot product between the normal vectors and the line-of-sight vector. Then the sign of the result is used to determine if the face is visible. If you recall, the sign of the dot product will be positive if the angle between the two vectors being dotted is less than or equal to 90 degrees. Therefore, if the results of the dot product are positive, the face must be visible and hence should be rendered. On the other hand, if the sign of the dot product is negative, the angle between the player’s view direction and the face is greater than 90 degrees, and the face can’t possibly be visible. To sum up:
- if v . n >=0 then the face is visible
- else the face is invisible and should be culled
This calculation is performed on all polygon faces that make up the objects in the universe. Then, once a list of visible polygons has been determined, the rendering process begins, which is the second phase of hidden surface removal (in a roundabout, cavalier way). The whole reason we perform the back-face culling is that it usually cuts scene complexity down by 50 percent, since in any given scene, half the faces that make up the objects are usually invisible (facing the other way). This speeds up the rendering by a factor of 2!
Now that we have pruned away the back faces, the next step is to render the polygons. This step can be harder than you may think since the order that the polygons are drawn in is very important.
Depth Sorting and the Painter’s Algorithm
Given a set of polygons that are not parallel to the view plane, as shown in Figure 10-67, it isn’t obvious what order they should be drawn in so that the final scene looks 3D. This is the function of hidden surface determination, or ordering, algorithms such as the Painter’s algorithm and the Z-buffer algorithm. These algorithms are responsible for rendering a 3D set of polygons that are drawn in such a way that they occlude each other properly.
Let’s say we wish to render the two polygons that are shown in Figure 10-68, which is a top-down X-Z plane view. We see that polygon A looks like it should be drawn first if the polygons are viewed straight on from the Z axis. However, we came to this conclusion using our own vision systems and about 100 billion brain cells. The computer doesn’t have that luxury. Instead, we supply it with a set of rules to produce the correct image. One possible set of rules would be to take the Z extents of the polygons and draw the polygons with farthest Z extents first, then the polygons with closer Z extents last, so they will overlap and obscure the farther polygons.
This technique actually works, but not in all cases. If all the polygons are parallel to the view plane, this algorithm works perfectly since the farther polygons are rendered first and then occluded by the nearer polygons. The name Painter’s algorithm originated from the fact that painters use a technique much like this when painting on a canvas. The painter paints the distant objects first followed by the nearer ones; hence, the nearer objects obscure the more distant ones.
The problem with the Painter’s algorithm is that it falls apart when the polygons are not all parallel to the view plane. For example, if we take a look at Figure 10-69, we see another example of two polygons. This time polygon A extends farther than polygon B does; so using our same line of thought as before, we would draw polygon A first, but this would result in an error! As we can see by looking at the figure, we should really draw polygon B first. Therefore, we need to add more rules to our algorithm to solve this problem. Well, it turns out that there are about five rules that need to be implemented along with a lot of special case code. The end result is an algorithm that is crude, slow, and a mess to implement.
So what do we do? Well, if we are willing to be imperfect, we can partially solve the problem by using a subset of the Painter’s algorithm that works reasonably well. For example, if we sort all the polygons in a scene by their average Z values and then draw the polygons with farther Z values first and smaller Z values last, the final image will be reasonably correct most of the time. The algorithm can further be improved by making the polygons very small and using only convex objects in the universe. Convex objects are objects that don’t have dents in them. Figure 10-70 shows a convex object in comparison to a non-convex or concave object.
In reality, many 3D games use only a depth sort to compute the order to draw polygons. Although this will result in incorrect displays from time to time, it is acceptable. The question is, are there other rendering algorithms that create a perfect image all the time and are easier to implement? The answer, of course, is yes!
Z-Buffer Algorithms
Since all polygons in a 3D scene have vertices that are coplanar, all polygons lie in some plane. Therefore, using the vertices of the polygon, it is possible to compute the plane equation of the plane that the polygon lies in, as shown in Figure 10-71. With the plane equation in hand, we can compute the z component of every point on the polygon. Using this information, we can then compare the z components of all the polygons pixel-by-pixel and draw the pixels in the proper order. This is called the Z-buffer algorithm and is by far the simplest and most reliable hidden surface rendering technique available.
Let’s try and pin down exactly how a Z-buffer system works. First, there is the Zbuffer, which is simply a 2D matrix that has the same resolution as the display screen except that instead of holding colors, it holds the z coordinate of every pixel of the final image. Since a Z-buffer might contain wildly varying values, most highperformance Z-buffers are 32 bits deep; however, we don’t have that luxury, so our hypothetical Z-buffer is going to be only 16 bits deep (see Figure 10-72). Therefore, our Z-buffer can record z values in the range of 0 to 65535 or –32768 to 32767, which will suffice for our purposes.
Once a Z-buffer has been allocated, it is initialized with the maximum value that can be recorded, which in our case is 65535. Then the Z-buffer is ready to process all the polygons. But to do this, we must compute the plane equation of each polygon, and then for each (x,y) screen location that makes up a polygon, the z component must be computed and tested against the Z-buffer. So how do we compute the plane equation of a general polygon? First, all the polygons in our engine are going to be either triangles or quads, so that makes things easier. In general, all we need to do is compute the normal vector to the polygon and select a single point from the polygon. Using that information, we can construct the point normal form of the plane equation shown below,
- nx*(x–xo) + ny*(y–yo) + nz*(z–zo) = 0
where the point (xo,yo,zo) is any point on the polygon and the vector <nx,ny,nz> is the normal to the vector.
As an example, let’s compute the plane equation of the triangle in Figure 10-71. The vectors are
- v1 = P2–P1 = (7,4,1) – (3,3,3) = <4,1,–2>
- v2 = P3–P1 = (8,1,3) – (3,3,3) = <5,–2,0>
The normal vector n is found by taking the cross product between v1 and v2:
- n = v2 x v1 = <–11,13,25>
Therefore, using point P1 on the triangle, we can write the point normal form of the equation as:
- nx*(x–xo) + ny*(y–yo) + nz*(z–zo) = –11*(x–3) + 13*(y–13) + 25*(z–3) = 0
And collecting terms, we get,
- –11*x + 13*y + 25*z = 81
which is the general form of a plane. Then we can take this equation and solve for z, resulting in:
81+11*x – 13*y
z = ______________
25
Then as we scan-convert the polygon values, we can solve for z. Using the above formulation or a variation, you can see that it is possible to compute the z values for a polygon as it is being scan-converted. Therefore, the final Z-buffer algorithm is shown as follows:
- Initialize the Z-buffer.
- For each polygon in the scene, compute the plane equation of the polygon.
- {
- Scan-convert the polygon, and for each point (x,y) compute the z
- value. If the z value is nearer than the current z value in the Zbuffer,
- then place the color of the polygon into the frame buffer,
- overwriting the old color.
- }
- For each polygon in the scene, compute the plane equation of the polygon.
- Display the frame buffer.
Of course, we have left out a few details, but this captures the essence of the Zbuffer algorithm. The question is, how fast is it? Well, to tell you the truth, as it stands, it would be very difficult to implement in software with reasonable performance. Moreover, we can’t waste 128K of memory for a Z-buffer in a standard DOS application. Therefore, most 3D engines use a modified version of the Z-buffer called the Scanline Z-buffer, which is similar to the standard Z-buffer shown above except that it operates on a scanline at a time. That, along with some clever optimizations, will result in a very slick hidden surface rendering algorithm, but we’ll get to that in Chapter 15. Now, let’s talk a little about light and color.
Illumination
Now that we have an idea of how we’re going to create a 3D universe and display it on the video screen, it’s time to discuss lighting and illumination. In a standard 2D game, the objects are simply colored with static colors. These colors stay the same regardless of the position of the game objects relative to the synthetic lights that may exist in the game.
Granted, some 2D games may have simulated lighting and color palette rotation effects; however, these effects have no mathematical or physical basis. They simply look good. Within a real 3D game, there usually exist real light sources. These light sources are used to shade and render each polygon face in an appropriate way based on a mathematical model that is usually an approximation. We aren’t going to delve too deeply into lighting in this chapter, but let’s at least take a look at the different light models and their general implementation.
Ambient Lighting
Ambient lighting is the lighting that exists due to the average light level in a room or area. For example, the room you’re in right now may have a lamp or two in it. However, these lamps(s) usually have shades on them and point up toward the ceiling. The result of this is that the light (photons) is diffused and reflected in all directions. This creates an average light level in the room. This average is called the ambient light level.
Three-dimensional graphics engines usually model ambient light by simply coloring the polygon faces with an intensity relative to the ambient light level. For instance, if we allowed the ambient light level to range from 0 to 15, then based on the color of a polygon, we would select a shade to color the polygon based on the ambient light level. In a mathematical way, we would use the following rule,
- Ia = k*(ambient_light_level)
where k is a constant that scales the ambient level to make it fit with our color palette.
Using the above equation, how would we implement an ambient lighting system? One way would be to divide the color palette into 16 shades of 16 colors. Then when defining an object, we only select the color of the object’s faces. However, when the objects are drawn, the ambient light level is used to compute the final color register value to use. For example, if the color palette was of the format
- color 0
- shade 1, shade 2,…shade 15
- color 1
- shade 1, shade 2,…shade 15
- .
- .
- .
- color 15
- shade 1, shade 2,…shade 15
and we defined the colors as being in the range 0 to 15, then given that a polygon face was given by the variable color, we might find the final color register to use by the following formula:
- Given that Ia ranges from 0 to 15,
- final_color = color*16 + Ia
Nothing more than a simple calculation and a proper color palette are needed to implement ambient lighting. But ambient lighting can be a bit boring. More 3D realism can be achieved by using point light sources.
Point Light Sources
Point light sources are like lightbulbs (see Figure 10-73). The light from them radiates out in all directions like a spherical energy wave. Point light sources are more of what we are used to in everyday life. They have the property that objects illuminated by them reflect more or less light off their faces based on the angles the faces make with the point light source. Furthermore, the farther an object is from the point light source, the weaker the overall light intensity becomes (this is an effect of the inverse square law).
Although both of the aforementioned effects occur with point light sources, most implementations only consider the first or the angular variance. The distance falloff is so infinitesimal that it really doesn’t help the realism very much and isn’t worth the trouble of implementing. With that in mind, let’s briefly take a look at how point light sources are used to compute light intensity.
Figure 10-74 shows the relationship between a point light source and a polygon face. Similarly to the back-face visibility computations, the normal vector of the polygon is computed along with the light source vector l. Then the dot product is computed between them solving for the COSINE of the angle theta:
u . v
COS θ = _________
|u || v |
Since the COS θ varies from –1 to 1, this information is used to compute the intensity of incident light on the polygon face. For example, when COS θ = 0, the incident light is grazing and its intensity should be 0. However, when the COS θ becomes |1|, the light source is perpendicular to the polygon face and the incident light will be the highest. As an example, the incident light on a polygon face, as a function of θ, is shown in Figure 10-75. Basically, the curve is nothing more than a COSINE curve.
To add point light sources is very simple. After the ambient light computation is complete, the point light source computation is performed and added to the ambient light to arrive at an overall intensity level. Again, this intensity is used to index into a color table and select the appropriate shade of color.
Directional Light Sources
The final type of light source we are going to discuss is directional lights. Figure 10- 76 shows the physics of a directional light source. As you can see, all the light rays are parallel and reflect off the polygon face at an angle opposite to the angle at which the light rays were incident. This type of reflection model creates specular effects or highlights. However, implementing specular effects and highlights is rather complex and time-consuming. This is true since not only the direction of the light source is needed along with the polygon normals, but the view direction of the player is also needed to compute the final angle that the reflected light makes with the viewing direction.
This final angle is then used to compute the intensity of the light as seen by the player. However, in reality the relationship isn’t linear, and a more complex exponential model is employed many times. Hence, specular effects aren’t going to be part of our final graphics engine, but it’s nice to know how they are computed.
The Graphics Pipeline
We have covered all the main aspects of 3D computer graphics. Admittedly, we have glazed over a few topics, but they will all be covered in depth in the following chapters, along with the software implementations. The main goal of the preceding material is to give you an overall idea of the order of operations that must be performed to create a 3D scene. Having even a loose grasp on this will help immensely during the following chapters. As I said before, one of the most important concepts to grasp about 3D graphics is not the individual algorithms or mathematics so much, but how everything fits together. With that in mind, let’s take a final look at a possible 3D graphics pipeline. Figure 10-77 shows one such pipeline.
As you can see from the figure, the system does both object space clipping and image space clipping. Moreover, the clipping is done after the viewing transformation. If we wanted to, we could place the clipping before the viewing transformation, but the math is a bit harder since we would have to use general plane equations. In any case, the idea of all of this is to minimize the amount of computations and the number of polygons processed per frame.
It’s possible that one pipeline may be very efficient for a flight simulator, but terrible for an architectural walk-through. As Cybersorcerers, we must find the right combinations, tricks, and methods to extract as much 3D performance as possible out of our engine, even if it means some things aren’t done correctly all the time. For example, we might use a primitive depth sort to compute the rendering order of polygons instead of a more precise Z-buffer.
Summary
Without a doubt, this chapter has been the most complex so far. We began by learning the language of 3D graphics, which is vectors, matrices, and fixed point math. Then we learned about 3D transformations, coordinate systems, clipping, and hidden surface removal and rendering techniques. Finally, we touched upon lighting models used to make the 3D images more realistic, and ended with a final look at the graphics pipeline.
If you have a good understanding of the general topics in this chapter, the following chapters that focus on each subject should make everything crystal clear.
Continue
- Black Art of 3D Game Programming: Writing Your Own High-Speed 3D Polygon Video Games in C Table of Contents
- Black Art of 3D Game Programming, Chapter 11: Building a 3D Graphics Engine
| Copyright 2006 Andre LaMothe |
|
