Mathematics for 3D Game Programming and Computer Graphics – The render pipeline

This chapter provides a preliminary review of the rendering pipeline. It covers general functions, such as vertex transformation and primitive rasterization, which are performed by modern 3D graphics hardware. Readers who are familiar with these concepts may safely skip ahead. We intentionally avoid mathematical discussions in this chapter and instead provide pointers to other parts of the book where each particular portion of the rendering pipeline is examined in greater detail.

Graphics Processors

A typical scene that is to be rendered as 3D graphics is composed of many separate objects. The geometrical forms of these objects are each represented by a set of vertices and a particular type of graphics primitive that indicates how the vertices are connected to produce a shape. Figure 1.1 illustrates the ten types of graphics primitive defined by the OpenGL library. Graphics hardware is capable of rendering a set of individual points, a series of line segments, or a group of filled polygons. Most of the time, the surface of a 3D model is represented by a list of triangles, each of which references three points in a list of vertices.


The usual modern 3D graphics board possesses a dedicated Graphics Processing Unit (GPU) that executes instructions independently of the Central Processing Unit (CPU). The CPU sends rendering commands to the GPU, which then performs the rendering operations while the CPU continues with other tasks. This is called asynchronous operation. When geometrical information is submitted to a rendering library such as OpenGL, the function calls used to request the rendering operations typically return a significant amount time before the GPU has finished rendering the graphics.

An application communities with the GPU by sending commands to a rendering library, such as OpenGL, which in turn sends commands to a driver that knows how to speak to the GPU in its native language. The interface to OpenGL is called a Hardware Abstraction Layer (HAL) because it exposes a common set of functions that can be used to render a scene on any graphics hardware that supports the OpenGL architecture. The driver translates the OpenGL function calls into code that the GPU can understand. A 3D graphics driver usually implements OpenGL functions directly to minimize the overhead of issuing rendering commands. The block diagram shown in Figure 1.2 illustrates the communications that take the place between the CPU and GPU.


A 3D graphics board has its own memory core, which is commonly called VRAM (Video Random Access Memory). The GPU may store any information in VRAM, but her are several types of dat aha can almost always be found in the graphics board’s memory when a 3D graphics application is running.

  • Most importantly, VRAM contains the front and back image buffers.
  • The front image buffer contains the exact pixel data that is visible in the viewport.
    • The viewport is the area of the display containing the rendered image and may be a subregion of a window, the entire contents of a window, or the full area of the display.
  • The back image buffer is the location to which the GPU actually renders a scene.
    • The back buffer is not visible and exists so that a scene can be rendered in its entirety before being shown to the user.

Once an image has been completed rendered, the front and back image buffers are exchanged. This operation is called a buffer swap and can be performed either by changing the memory address that represents the base of the visible image buffer or by copying the contents of the back image buffer to the front image buffer.

The buffer swap is often synchronized with the refresh frequency of the display to avoid an artifact known as tearing. Tearing occurs when a buffer swap is performed during the display refresh interval, causing the upper and lower parts of a viewport to show data from different image buffers.

Also stored in VRAM is a block of data called the depth buffer or z-buffer. The depth buffer stores, for every pixel in the image buffer, a value that represents how far away the pixel is or how deep the pixel lies in the image.

  • The depth buffer is used to perform hidden surface elimination by only allowing a pixel to be drawn if its depth is less than the depth of the pixel already in the image buffer.
  • Depth is measured as the distance from the virtual camera through which we observe the scene being rendered.
  • The name z-buffer comes from the convention that the z axis point directly out of the display screen in the camera’s local coordinate system.

An application may request that a stencil buffer be created along with the image buffers and the depth buffer. The stencil buffer contains an integer mask for each pixel in the image buffer that can be used to enable or disable drawing on a per pixel basis.

For the vast majority of 3D rendering applications, the usage of VRAM is dominated by texture maps. Texture maps are images that are applied to the surface of an object to give it greater visual detail. In advanced rendering applications, texture maps may contain information other than a simple pixel image.

  • For instance, a bump map contains vectors that represent varying slopes at different locations on an object’s surface.

Vertex Transform

Geometrical data is passed to the graphics hardware in the context of a three-dimensional space. One of the jobs performed by the graphics hardware is to transform this data into geometry that can be drawn into a two-dimensional viewport. There are several different coordinate system associated with the rendering pipeline — their relationships are shown in Figure 1.3.

  • The vertices of a model are typically noted in object space, a coordinate system that is local to the particular model and used only by that model.
  • The position and orientation of each model are often stored in world space, a global coordinate system that ties all of the objet spaces together. Before an object can be rendered, its vertices must be transformed in to a camera space.
  • Before an object can be rendered, its vertices must be transformed int o_camera space_ (also called eye space), the space in which the x and y axes re aligned to the display and the z axis is parallel to the viewing direction.
    • It is possible to transform vertices from object space directly into camera space by concatenating the matrices representing the transformations from object space to world space and from world space to camera space.
    • The product of these transformations is called the model-view transformation


Once a model’s vertices have been transformed into camera space, they undergo a projection transformation that has the effect of applying perspective so that geometry becomes a smaller as the distance from the camera increases.

  • The projection is performed in four dimensional homogeneous coordinates, and the space in which the vertices exist after projection is called homogeneous clip space.
    • Homogeneous clip space is so named because it is in this space that graphics primitives are clipped to the boundaries of the visible region of the scene, ensuring that no attempt is made to render any part of a primitive that falls outside the viewport.
  • In homogeneous clips space, vertices have normalized device coordinates. The term normalized pertains to the fact that the x, y, and z coordinates of each vertex fall in the range [-1, 1], but reflect the final positions in which they will appear in the viewport.
  • The vertices must undergo one more transformation, called the viewport transformation, that maps the normalized coordinates to the actual range of pixel coordinates covered by the viewport.
    • The z coordinate is usually mapped to the floating-point range [0, 1], but this is subsequently scaled to the integer range corresponding to the number of bits per pixel utilized by the depth buffer.
    • After the viewport transformation, vertex positions are said to lie in window space.

A graphics processor usually performs several per-vertex calculations in addition to the transformation from objet space to window space.

  • For instance, the OpenGL lighting model determines the color and intensity of light reaching each vertex and then calculates how much of that is reflected toward the camera.
    • This process is called per-vertex lighting.
    • More-advanced graphics applications may perform per-pixel lighting to achieve highly detailed lighting interactions at every pixel covered by a graphics primitive.
  • Each vertex may also carry with it one or more sets of texture coordinates. Texture coordinates may be explicitly specified by an application or automatically generated by the GPU.
    • When a graphics primitive is rendered, the texture coordinates are interpolated over the area of the primitive and used to look up colors in a texture map.
    • These colors are then combined with other interpolated data at each pixel to determine the final color that appears in the viewport.

Rasterization and Fragment Operations

Once a model’s vertices have been clipped and transformed into window space, the GPU must determine what pixels in the viewport are covered by each graphics primitive.

  • The process of filling in the horizontal spans of pixels belonging to a primitive is called rasterization.
  • The GPU calculates the depth, interpolated vertex colors, and interpolated texture coordinates for each pixel. This information, combined with the location of the pixel itself, is called a fragment.

The process through which a graphics primitive is converted to a set of fragments is illustrated in Figure 1.4. An application may specify that face culling be performed as the first stage of this process. Face culling applies only to poly goal graphics primitives and removes either the polygons that are facing away from the camera or those that prefacing toward the camera.


A graphics application specifies how the fragment data is used to determine the final color and final depth of each pixel during rasterization. This process is called fragment shading or pixel shading.

  • The final color may simply be given by the product of an interpolated vertex color and a value fetched from a texture map, or it may be the result of a complex per-pixel lighting calculation. The final depth is ordinarily just the unaltered interpolated depth, but advanced 3D graphics hardware allows an application to replace the depth with the result of an arbitrary calculation.

Figure 1.5 illustrates the operations performed for each fragment generated during rasterization. Most of these operations determine whether a fragment should be drawn to the viewport or discarded altogether. Although these operations occur logically after fragment shading, most GPUs perform as many tests as possible before fragment shading calculations to avoid spending time figuring out the colors of fragments that will ultimately by discarded.


The first fragment operations performed, and the only one that cannot be disabled, is the pixel ownership test. The pixel ownership test simply determines whether a fragment lies in the region of the viewport that is currently visible on the display.

  • A possible reason that the pixel ownership test fails is that another window is obscuring a portion of the viewport. In this case, fragments falling behind the obscuring window are not drawn.

Next, the scissor test is performed. An application may specify a rectangle in the viewport, called the scissor rectangle, to which rendering should be restricted. Any fragments falling outside the scissor rectangle are discarded. A particular application of the scissor rectangle in the context of the stencil shadow algorithm.

If the scissor test passes, a fragment undergoes the alpha test. When the final color of a fragment is calculated, an application may also calculate an alpha value that usually represents the degree of transparency associated with the fragment.

  • The alpha test compares the final alpha value of a fragment to a constant value that is preset by the application.
  • The application specifies what relationship between the two values (such as less than, grater than, or equal to) causes the test to pass.
  • If the relationship is not satisfied, then the fragment is discarded.

After the alpha test passes, a fragment moves on to the stencil test. The stencil test reads the value stored in the stencil buffer at a fragment’s location and compares it to a value previously specified by the application. The stencil test passes only if a specific relationship is satisfied; otherwise, the stencil test fails, and the fragment is discarded.

The final test undergone by a fragment is the depth test. The depth test compares the final depth associated with a fragment to the value currently residing in the depth buffer.

  • If the fragment’s depth does not satisfy an application-specified relationship with the value in the depth buffer, then the fragment is discarded.
  • Normally, the depth test is configured so that a fragment passes the depth test only if its depth is less than or equal to the value in the depth buffer.
  • When the depth test passes, the depth buffer is updated with the depth of the fragment to facilitate hidden surface removal for subsequently rendered primitives.

Once the pixel ownership test, scissor test, alpha test, stencil test, and depth test have all passed, a fragment’s final color is blended into the image buffer.

  • The blending operation calculates a new color by combining the fragment’s final color and the color already stored in the image buffer at the fragment’s location.
  • The fragment’s alpha value and the alpha value stored in the image buffer may also be used to determine the color that ultimately appears in the viewport.
  • The blending operation may be configured to simply replace the previous color in the image buffer, or it may produce special visual effects such as transparency.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.