Geometry

Transforming Points and Vectors

Reading time: 16 mins.

Point Transformation Techniques

This section delves into the necessary steps to transform points using matrices, with a specific focus on integrating translation into matrix multiplication, a concept we haven't covered yet. Despite translation being one of the simplest linear operations to apply to a point, its incorporation into the matrix framework requires a structural adjustment of the point itself.

Recalling from earlier discussions, matrix multiplication is possible only when the matrices involved have compatible dimensions—specifically, $m \times p$ and $p \times n$. Starting with the fundamental 3x3 identity matrix, where a point's coordinates remain unchanged upon multiplication, let's explore the changes needed to support translation through point-matrix multiplication. Translation involves adding specific values to each coordinate of a point. For example, transforming the point $(1, 1, 1)$ to $(2, 3, 4)$ is done by adding 1, 2, and 3 to its x, y, and z coordinates, respectively. For this discussion, points and vectors are considered as $1 \times 3$ matrices.

To incorporate translation into the matrix that already performs rotation, we introduce a fourth term to encode the translation components. This extension requires adding T_X, T_Y, and T_Z to the matrix multiplication formula, resulting in a modified expression that includes these translation values:

$$ \begin{array}{l} P'.x = P.x * M_{00} + P.y * M_{10} + P.z * M_{20} + T_X\\ P'.y = P.x * M_{01} + P.y * M_{11} + P.z * M_{21} + T_Y\\ P'.z = P.x * M_{02} + P.y * M_{12} + P.z * M_{22} + T_Z \end{array} $$

This change suggests using a 4x3 matrix instead of the original 3x3 format. To make sure we can still multiply it with a point (which is a 1x3 matrix), we expand the point into a 1x4 matrix by adding a fourth component, which we set to 1. This turns it into a homogeneous point. This adjustment allows us to include translation in the matrix, where the added fourth component works with the translation values, as shown in the formula below:

$$ \begin{array}{l} P'.x = P.x * M_{00} + P.y * M_{10} + P.z * M_{20} + \textcolor{red}{1} * M_{30}\\ P'.y = P.x * M_{01} + P.y * M_{11} + P.z * M_{21} + \textcolor{red}{1} * M_{31}\\ P'.z = P.x * M_{02} + P.y * M_{12} + P.z * M_{22} + \textcolor{red}{1} * M_{32} \end{array} $$

This foundational theory allows us to encode translation, scale, and rotation within a single matrix when working with points in homogeneous coordinates. Although the fourth value is implicitly considered to be 1 and not explicitly defined in code, we effectively work with Cartesian coordinates (with 3 coordinates) in practice. However, the theory assumes we are implicitly dealing with a homogeneous point (with 4 coordinates), where the fourth coordinate is considered to be 1.

So, the takeaway here is that the fourth row of our matrix stores the translation values we want to apply to the point. But since we're using a 4x3 matrix, we now need a 1x4 point or vector. This follows the matrix multiplication rule, which says that the outer size of the left-hand matrix must match the inner size of the right-hand matrix. In other words, multiplying a [1x4] by a [4x3] is a valid matrix multiplication (a [1x3] by [4x3] matrix multiplication is not). We set the fourth coordinate of the point to 1, and since it's always 1 (and we don’t want to store an extra float in our program), we just assume it’s implicitly there—so we don’t explicitly define it or write it on paper. This becomes more apparent when you look at how we would implement this in code:

void point_matrix_mult(const point3f& p, const matrix44f& m, point3f& pt) {
	pt.x = p.x * m[0][0] + p.y * m[1][0] + p.y * m[2][0] + /* x-translation */ m[3][0];
	pt.y = p.x * m[0][1] + p.y * m[1][1] + p.y * m[2][1] + /* y-translation */ m[3][1];
	pt.z = p.x * m[0][2] + p.y * m[1][2] + p.y * m[2][2] + /* z-translation */ m[3][2];
}

As mentioned, these [1x4] points are referred to as homogeneous coordinates or homogeneous points in computer graphics and linear algebra. When dealing with matrices that encode translation, scaling, and rotation, the fourth coordinate of this point is always assumed to be 1, so it doesn’t need to be explicitly defined or used when writing code to transform a point by a matrix, as shown in the code above.

There is, of course, an exception. The fourth coordinate is not always 1—this only happens when points are being transformed by perspective projection matrices, which have very specific properties that normal matrices (those used to translate, rotate, or scale points) don’t have. You can find more information about this in the lesson on Perspective Projection Matrices.

Note that the fourth coordinate of a point represented with homogeneous coordinates is almost always referred to as w.

$$P_{h} = (x, y, z, w = 1)$$

The point with homogeneous coordinates $(x, y, z, w = 1)$, when multiplied by a 4x4 matrix that includes translation values in row-major order, can be written as:

$$ \begin{bmatrix} P'.x & P'.y & P'.z \end{bmatrix} = \begin{bmatrix} P.x & P.y & P.z & \textcolor{red}{1} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{red}{M_{00}} & \textcolor{green}{M_{01}} & \textcolor{blue}{M_{02}} \\ \textcolor{red}{M_{10}} & \textcolor{green}{M_{11}} & \textcolor{blue}{M_{12}} \\ \textcolor{red}{M_{20}} & \textcolor{green}{M_{21}} & \textcolor{blue}{M_{22}} \\ \textcolor{magenta}{M_{30}} & \textcolor{magenta}{M_{31}} & \textcolor{magenta}{M_{32}} \end{bmatrix} $$

Here, the coefficients $M_{30}, M_{31}, M_{32}$ hold the translation values that will be applied to the point (note that translating a vector doesn't make sense, but more on this in a moment). Breaking it down for each coordinate:

$$ \begin{array}{l} P'.x = P.x \cdot M_{00} + P.y \cdot M_{10} + P.z \cdot M_{20} + \textcolor{red}{1} \cdot \textcolor{magenta}{M_{30}}\\ P'.y = P.x \cdot M_{01} + P.y \cdot M_{11} + P.z \cdot M_{21} + \textcolor{red}{1} \cdot \textcolor{magenta}{M_{31}}\\ P'.z = P.x \cdot M_{02} + P.y \cdot M_{12} + P.z \cdot M_{22} + \textcolor{red}{1} \cdot \textcolor{magenta}{M_{32}}\\ \end{array} $$

Here, the fourth row, specifically $M_{30}, M_{31}, M_{32}$, represents the translation values applied to the homogeneous point, and $P.w$ is implicitly 1. As mentioned, when you're not dealing with a perspective projection matrix, you will almost never explicitly handle points with homogeneous coordinates. Instead, you'll work with points defined by their standard x, y, and z coordinates (Cartesian coordinates). We’re showing $P.w = \textcolor{red}{1}$ here to give you the full picture, but in practice, $P.w$ is something we don’t explicitly deal with. It’s "ghosted," so to speak.

Why Do We Use 4x4 Matrices Instead of 4x3 Matrices?

Now, one question remains: why do we use [4x4] matrices instead of [4x3], which should technically be enough? Well, a good way to answer this question is by starting with a 4x4 matrix where the fourth column is set to $(0, 0, 0, 1)$. Let's see what multiplying a 1x4 homogeneous point by this 4x4 matrix looks like:

$$ \begin{bmatrix} P'.x & P'.y & P'.z & P'.w \end{bmatrix} = \begin{bmatrix} P.x & P.y & P.z & \textcolor{red}{1} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{red}{M_{00}} & \textcolor{green}{M_{01}} & \textcolor{blue}{M_{02}} & 0 \\ \textcolor{red}{M_{10}} & \textcolor{green}{M_{11}} & \textcolor{blue}{M_{12}} & 0 \\ \textcolor{red}{M_{20}} & \textcolor{green}{M_{21}} & \textcolor{blue}{M_{22}} & 0 \\ \textcolor{magenta}{M_{30}} & \textcolor{magenta}{M_{31}} & \textcolor{magenta}{M_{32}} & 1 \end{bmatrix} $$

Breaking it down for each coordinate:

So, when we use a 4x4 matrix where the fourth column is set to $(0, 0, 0, 1)$, we see that $P'.w$ remains unchanged since we start with $P.w = 1$ and end up with $P'.w = 1$. In effect, nothing happens to the fourth coordinate of the homogeneous point. A homogeneous point with a fourth coordinate of 1 is just a Cartesian point.

In essence, this matrix, leaves the homogeneous fourth coordinates of the point unchanged:

$$ \begin{bmatrix} \textcolor{red}{M_{00}} & \textcolor{green}{M_{01}} & \textcolor{blue}{M_{02}} & 0 \\ \textcolor{red}{M_{10}} & \textcolor{green}{M_{11}} & \textcolor{blue}{M_{12}} & 0 \\ \textcolor{red}{M_{20}} & \textcolor{green}{M_{21}} & \textcolor{blue}{M_{22}} & 0 \\ \textcolor{magenta}{M_{30}} & \textcolor{magenta}{M_{31}} & \textcolor{magenta}{M_{32}} & 1 \end{bmatrix} $$

And all 4x4 matrices used to transform points and vectors are of this type. They all have their fourth column set to $(0, 0, 0, 1)$. So, don’t worry too much about why we use 4x4 matrices instead of 3x4 matrices—using a 4x4 matrix with the fourth column set to $(0, 0, 0, 1)$ doesn’t make any real difference. Since many of us developers have a bit of an obsessive-compulsive tendency and prefer neat square matrices, we just made it square and set up the additional column in a way that doesn't affect the result.

But then, you might be thinking, “Wait, this makes it even worse! Why would anyone use 4x4 matrices if the fourth column is just sitting there doing nothing (besides satisfying developers’ OCD)?” Well, there’s one important exception to this rule: when dealing with perspective projection matrices (so you see it isn't totally useless after all). In that case, the fourth column is different from $(0, 0, 0, 1)$. Shearing is another example, although it's quite rare in practice (in 25+ years of professional work, I’ve never used or even seen it, but I could be missing something there).

We won't dive into perspective projection in this particular lesson since it's covered in detail in the Perspective Projection Matrices lesson. But, keeping on this rational train of thought, you might say, “Wait, so you're still using 4x4 matrices even though in 95% of cases you don’t need them? That’s crazy!” Yes, it does seem a bit overkill. However, prior to the widespread use of ray tracing in production, the dominant method for rendering images was rasterization. In rasterization, the perspective projection matrix is used much more frequently than in ray tracing, so in that context, 4x4 matrices are absolutely necessary.

My final explanation is that it’s done out of convenience. We write code that works for 4x4 matrices even though we seldom use the fourth column, just so it’s ready for those cases when we do need it. Also, having nicely packed 4x4 floats in memory might work better for memory alignment and performance.

That said, it’s not strictly necessary. For example, Intel’s Embree Ray-Tracing kernel uses an AffineSpaceT struct, which encodes the Cartesian coordinates as three vec3f variables (LinearSpace3) and the translation as a separate vec3f and the project uses that rather than matrices. This shows that using 4x4 matrices is a convention rather than a strict requirement.

We will explore more about homogeneous coordinates in the next section.

Homogeneous Coordinates Are Not Magic

As a bit of trivia, I remember when I was a beginner in the field of 3D programming back in the early 90s. At that time, there weren’t nearly as many books on computer graphics programming as there are today. Most books would, of course, start with a chapter on geometry, but many would suddenly jump from points defined by 3 coordinates (Cartesian points) to points with 4 coordinates (homogeneous points) without much explanation.

With the scarcity of printed materials and no internet for easy access, I had no one to learn from directly. Stumbling upon this concept so early in the books completely shattered my confidence in understanding computer graphics techniques. It took me a while to grasp something that, even today, is rarely explained clearly.

A lot of my motivation for writing Scratchapixel comes from this exact issue, which I’ve noticed in almost every book on 3D programming: they just don’t provide an in-depth explanation of why homogeneous coordinates are used and how they relate to 4x4 matrices and perspective projection matrices. I’ve done my best here to cover those details. And if it’s still unclear, let me know, and I’ll keep refining this section until I nail it.

While ray tracing is becoming more popular, rasterization is still an extremely common method as of 2024. Most, if not all, 3D tools that use the GPU to render a 3D scene in a viewport rely on the rasterization engine of that GPU (via APIs like DirectX, Vulkan, Metal, or OpenGL). So, it’s still highly relevant to learn and fully understand these concepts in depth.

The concept of representing points as homogeneous coordinates is essential for enabling multiplication by [4x4] matrices in computer graphics, as explained earlier. However, this representation is almost always managed implicitly in programming since the homogeneous coordinate $w$ is typically assumed to be 1. As such, in C/C++ coding practices, a Point class will usually define a point with just three floats (x, y, and z), leaving out the explicit declaration of the fourth $w$ coordinate. When a homogeneous point undergoes multiplication by a [4x4] matrix, the transformed point's $w$ coordinate is calculated by the matrix's fourth column. This column is usually $(0, 0, 0, 1)$, which results in a transformed $w'$ of 1, allowing for the direct use of the transformed $x'$, $y'$, and $z'$ coordinates.

However, this changes when dealing with projection matrices. In this case, the fourth column deviates from $(0, 0, 0, 1)$, which can lead to $w'$ differing from 1. To correct for this, the transformed coordinates ($x'$, $y'$, $z'$) must be normalized by dividing each by $w'$ to revert back to Cartesian coordinates, as shown in the pseudo-code below:

P'.x = P.x * M00 + P.y * M10 + P.z * M20 + M30;
P'.y = P.x * M01 + P.y * M11 + P.z * M21 + M31;
P'.z = P.x * M02 + P.y * M12 + P.z * M22 + M32;
w'   = P.x * M03 + P.y * M13 + P.z * M23 + M33;
if (w' != 1 && w' != 0) {
    P'.x /= w'; P'.y /= w'; P'.z /= w';
}

This approach eliminates the need for explicitly declaring a $w$ coordinate in the Point data type. The $w'$ value is computed on-the-fly, assuming that the point is either a Cartesian point or a homogeneous point with an undeclared $w$ coordinate (implicitly set to 1). This method is especially useful when multiplying by a projection matrix, as it requires normalizing all coordinates to reset $w'$ to 1, converting the point back to Cartesian coordinates.

The main takeaway is that homogeneous coordinates usually only matter when points are transformed by a perspective projection matrix. This is less common in ray tracing, where such matrices are not typically used. For a deeper understanding of the role of the $w$ coordinate and its application, check out the lesson on Perspective and Orthographic Projection Matrices, which explains how 3D points are projected onto the image plane using perspective projection. Additionally, the lesson on the Rasterization Rendering Method covers how projection matrices are used to project points onto the screen.

Implementing this functionality in C/C++ can follow two paths:

Some developers opt to always calculate w' and adjust the transformed point coordinates by w' if it differs from 1. This method, though comprehensive, is often unnecessary outside the context of projection matrices and can lead to wasted CPU resources in the majority of cases.
Alternatively, one might disregard w and w', assuming the use of matrices with a fourth column set to (0, 0, 0, 1). For projection matrices, a separate function can be designed to handle w' and adjust the coordinates accordingly.

For clarity and to maintain a balance between generality and optimization, the following example adopts a generic approach that includes computing w' and normalizing the coordinates when necessary:

void multVecMatrix(const Vec3<T> &src, Vec3<T> &dst) const {
    dst.x = src.x * m[0][0] + src.y * m[1][0] + src.z * m[2][0] + m[3][0];
    dst.y = src.x * m[0][1] + src.y * m[1][1] + src.z * m[2][1] + m[3][1];
    dst.z = src.x * m[0][2] + src.y * m[1][2] + src.z * m[2][2] + m[3][2];
    T w = src.x * m[0][3] + src.y * m[1][3] + src.z * m[2][3] + m[3][3];
    if (w != 1 && w != 0) {
        dst.x /= w;
        dst.y /= w;
        dst.z /= w;
    }
}

Vector Transformation

Vectors, unlike points, represent direction and magnitude without an inherent position, making their transformation simpler than points. Since vectors do not require translation—because their position is inherently meaningless—we focus solely on their direction and possibly their length. This distinction allows for a streamlined transformation process that omits translation, as illustrated in the transformation code comparison between points and vectors.

Here's the straightforward code for vector transformation, which notably excludes the translation component present in point transformation:

V'.x = V.x * M00 + V.y * M10 + V.z * M20;
V'.y = V.x * M01 + V.y * M11 + V.z * M21;
V'.z = V.x * M02 + V.y * M12 + V.z * M22;

Implementing vector transformation in C++ is achieved as follows, maintaining the exclusion of translation to preserve the vector's directional integrity:

void multDirMatrix(const Vec3<T> &src, Vec3<T> &dst) const {
    dst.x = src.x * m[0][0] + src.y * m[1][0] + src.z * m[2][0];
    dst.y = src.x * m[0][1] + src.y * m[1][1] + src.z * m[2][1];
    dst.z = src.x * m[0][2] + src.y * m[1][2] + src.z * m[2][2];
}

Transforming Normals

Normals, despite their vector-like properties, introduce additional complexity in their transformation, a subject to be elaborated in a dedicated chapter on Transforming Normals.

Concluding Insights

This discussion elucidates the preference for [4x4] matrices over [3x3] matrices, highlighting the essential role of the $c_{30}$, $c_{31}$, and $c_{32}$ coefficients in encoding translation values. The expansion to [4x4] matrices necessitates augmenting points with an additional $w$ coordinate, implicitly treating them as Homogeneous points for integration into Cartesian coordinate systems. Typically, the fourth column of transformation matrices is set to (0, 0, 0, 1), ensuring the w' coordinate remains 1. However, exceptions, such as projection matrices or shear transformations, may alter w', prompting normalization to maintain Cartesian relevance by adjusting x', y', and z' accordingly.

Alternative transformation representations exist beyond matrices, such as Euler's rotation vectors and Rodrigues' rotation formula, offering solutions to specific graphics problems, including the avoidance of gimbal lock — a limitation of matrix-based transformations. Quaternions, despite their complexity, are favored for their efficiency in interpolating rotations and avoiding gimbal lock, underscoring the diverse toolkit available for managing transformations in computer graphics.