Home

Rasterization

Distributed under the terms of the CC BY-NC-ND 4.0 License.

  1. An Overview of the Rasterization Algorithm
  2. The Projection Stage
  3. The Rasterization Stage
  4. The Visibility Problem, the Depth Buffer Algorithm and Depth Interpolation
  5. Perspective Correct Interpolation and Vertex Attributes
  6. Rasterization: a Practical Implementation
  7. Source Code (external link GitHub)

Rasterization: a Practical Implementation

Reading time: 15 mins.

Improving the Rasterization Algorithm

Aliasing and Anti-Aliasing

The techniques presented in previous chapters lay the foundation of the rasterization algorithm. However, we have implemented these techniques in a very basic manner. The GPU rendering pipeline and other rasterization-based production renderers use the same concepts but with highly optimized versions of these algorithms. Detailing all the different tricks used to speed up the algorithm goes beyond the scope of this introduction. We will briefly review some of them now but plan to devote a lesson to this topic in the future.

Figure 1: Jagged edges and pixel artifacts can be reduced using anti-aliasing.

First, let's consider a fundamental problem with 3D rendering. If you zoom in on the image of the triangle we rendered in the previous chapter, you'll notice that the edges of the triangle are irregular (this irregularity also affects the checkerboard pattern at the edges of the squares). The steps visible in Figure 1 are called jaggies. These jagged or stair-stepped edges are not artifacts; they result from breaking down the triangle into pixels. With rasterization, we break down a continuous surface (the triangle) into discrete elements (pixels), a process already mentioned in the Introduction to Rendering. The challenge is akin to representing a continuous curve or surface with Lego bricks. It's impossible without noticing the bricks (Figure 2).

Figure 2: Pixels can't properly capture the shape of continuous surfaces.

The solution in rendering is called anti-aliasing (AA). Instead of rendering just one sample per pixel, we divide the pixel into sub-pixels and perform the coverage test for each (Figure 3). While each sub-pixel is still just another "brick," this method captures the edges of objects with slightly more precision. Pixels are typically divided into an N by N grid of sub-pixels, where N is usually a power of 2 (2, 4, 8, etc.), though it can technically be any value greater than or equal to 1 (1, 2, 3, 4, 5, etc.). Various methods address this aliasing issue, and the described technique falls under sampling-based anti-aliasing methods.

Figure 3: When only one sample is used, the triangle is missed. However, by using sub-pixels, we can detect that the pixel overlaps the triangle at least partially. The pixel color is equal to the sum of the sub-pixel colors divided by the total number of sub-pixels or samples (in this example, 16 samples).

The final pixel color is computed as the sum of all sub-pixel colors divided by the total number of sub-pixels. For example, imagine the triangle is white. If only 2 or 4 samples overlap the triangle, the final pixel color will be equal to (0+0+1+1)/4=0.5. The pixel won't be completely white or black but will instead have a gradual transition, mitigating the stair-stepped pixel artifact. This technique is known as anti-aliasing. To fully understand anti-aliasing, one needs to delve into signal processing theory, which is a vast and complex topic.

Figure 4: Using sub-pixels to combat aliasing effectively.
Figure 5: Anti-aliasing helps smooth jagged edges.
Figure 6: 1 sample per pixel or 1 spp (top) vs. 4 samples per pixel (bottom).

Choosing N as a power of 2 is beneficial because most processors today can execute several instructions in parallel, and the number of parallel instructions is generally a power of 2. You can find information on the web about things like SSE instruction sets specific to CPUs, but GPUs use a similar concept. SSE is a feature available on most modern CPUs that can perform generally 4 or 8 floating-point calculations simultaneously (in one cycle). This means that, for the cost of one floating-point operation, you effectively get 3 or 7 additional operations for free. In theory, this can significantly speed up your rendering time by a factor of 4 or 8 (although such performance levels are hard to achieve due to the minor overhead of setting up these instructions). For instance, SSE instructions can render 2x2 sub-pixels at the cost of computing one pixel, resulting in smoother edges at virtually no extra cost.

Rendering Blocks of Pixels

Another technique to speed up rasterization involves rendering blocks of pixels. Instead of testing every pixel within a block, we start by examining the pixels at the block's corners. GPU algorithms might use blocks of 8x8 pixels. This method is part of a more complex concept involving tiles, which we won't delve into here. If all four corners of the 8x8 grid cover the triangle, it's likely that the remaining pixels within the block also cover the triangle, as illustrated in Figure 7.

Figure 7: If the 4 corners of an 8x8 pixel grid overlap the triangle, then all remaining pixels of the grid cover the triangle too.

In such cases, testing the other pixels becomes unnecessary, saving considerable time. These pixels can simply be filled with the triangle's colors. If vertex attributes need to be interpolated across the pixel block, this process is also simplified. Having computed them at the corners of the block, you only need to interpolate them linearly in both directions (horizontally and vertically). This optimization proves most effective for triangles that appear large on the screen. Smaller triangles don't benefit as much from this approach.

Optimizing the Edge Function

The edge function can also be optimized. Let's revisit its implementation:

int orient2d(const Point2D& a, const Point2D& b, const Point2D& c)
{
    return (b.x - a.x) * (c.y - a.y) - (b.y - a.y) * (c.x - a.x);
}

Remember, a and b in this function represent the triangle's vertices, and c denotes the pixel coordinates (in raster space). An interesting observation is that this function is called for each pixel within the triangle's bounding box. While iterating over multiple pixels, only c changes; a and b remain constant. Suppose we calculate the equation once and obtain a result w0:

w0 = (b.x - a.x) * (c.y - a.y) - (b.y - a.y) * (c.x - a.x);

If c.x increases by a step s (the per pixel increment), the new value of w0 becomes:

w0_new = (b.x - a.x) * (c.y - a.y) - (b.y - a.y) * (c.x + s - a.x);

By subtracting the original equation from the new one, we find:

w0_new - w0 = -(b.y - a.y) * s;

Since -(b.y - a.y) * s is constant for a specific triangle (given that s is a consistent increment and a and b are constant), we can calculate it once and store it as w0_step. This simplifies the calculation to:

w0_new = w0 + w0_step;

This adjustment can also be applied to w1 and w2, as well as for steps in c.y.

Originally, the edge function requires 2 multiplications and 5 subtractions, but with this optimization, it can be reduced to a simple addition, although initial values must be computed. This technique is widely discussed online. While we won't use it in this lesson, we plan to explore it further and implement it in a future lesson dedicated to advanced rasterization techniques.

Fixed Point Coordinates

Figure 8: fixed point coordinates.

To conclude this section, let's briefly discuss the technique of converting vertex coordinates from floating-point to fixed-point format just before the rasterization stage. "Fixed-point" is the technical term for what is essentially integer representation. When vertex coordinates are converted from NDC to raster space, they are also transformed from floating-point numbers to fixed-point numbers. Why do we do this? The answer isn't straightforward, but in essence, GPUs utilize fixed-point arithmetic because handling integers, through logical bit operations, is computationally easier and faster than dealing with floats or doubles. This explanation is quite generalized; the transition from floating-point to integer coordinates and the implementation of rasterization using integer coordinates encompass a broad and complex topic scarcely documented online, which is surprising given its central role in modern GPU functionality.

This conversion involves rounding vertex coordinates to the nearest integer. However, doing only this could align vertex coordinates too closely with pixel corners, a minor issue for still images but one that creates visual artifacts in animations (vertices may align with different pixels frame by frame). The solution involves converting the number to the smallest integer value while also reserving some bits to represent the sub-pixel position of the vertex (the vertex position's fractional part). Typically, GPUs allocate 4 bits for sub-pixel precision. For a 32-bit integer, 1 bit might be used for the number's sign, 27 bits for the vertex's integer position, and 4 bits for the vertex's fractional position within the pixel. This arrangement implies that the vertex position is "aligned" to the nearest corner of a 16x16 sub-pixel grid, as illustrated in Figure 8. Although vertices are still snapped to a grid, this method is less problematic than aligning them to pixel coordinates. This process introduces other challenges, including integer overflow, which can occur because integers represent a narrower range of values than floats. Additionally, integrating anti-aliasing adds complexity. Thoroughly exploring this topic would require its own lesson.

Fixed-point coordinates expedite the rasterization process and edge function calculations. This efficiency is a key reason for converting vertex coordinates to integers, a technique that will be further discussed in an upcoming lesson.

Notes Regarding Our Implementation of the Rasterization Algorithm

We'll now briefly review the code provided in the source code chapter, highlighting its main components:

Use this code for learning purposes only. It is far from efficient. Simple optimizations could significantly improve performance, but we prioritized clarity over efficiency. Our aim is not to produce production-ready code but to facilitate understanding of fundamental concepts. Optimizing the code can be an excellent exercise.

The object rendered by the program is stored in an include file. While acceptable for a small program, this practice is generally avoided in professional applications due to potential increases in program size and compilation time. However, for this simple demonstration, and given the modest size of the object, it poses no issue. For a deeper understanding of how 3D object geometry is represented in programs, refer to the lesson on this topic in the Modeling and Geometry section. The only information utilized in this program are the triangle vertices' positions (in world space) and their texture or st coordinates.

As illustrated, rendering isn't mystical. Understanding the rules allows replication of professional application outputs.

As an additional insight, we exported the world space position of the point on the triangle each image pixel overlaps, displaying all points in a 3D viewer. Unsurprisingly, points appear only on object parts directly visible to the camera, affirming the effectiveness of the depth-buffer technique. The second image provides a close-up of the same point set (right).

Conclusion

Rasterization's main advantages are its simplicity and speed, albeit primarily useful for addressing visibility issues. Remember, rendering involves two steps: visibility and shading. This algorithm offers no shading solutions.

This lesson aims to provide an in-depth introduction to rasterization. If you found this content enlightening and educational, consider supporting our work.

Exercises

Reference

previous-