Given the following equation and an arbitrary number of point correspondence determine scale, rotation, and translation.
Imagine you are sitting at home doing a bit of programming. Suddenly a rogue mathematician kicks in your door, and presents you with two sets of point data (spanning millions of points), demanding that you find a sufficient relative transformation between them. He assures you that:
- The points are linearly independent.
- You will have beforehand the correspondence of points between set p and set q.
If you don’t comply, the mathematician threatens to refactor portions of your code without consulting you first (the horror!). How do you accomplish the task?
The answer to this question seems like it should be easy. The difficulty comes from the fact the problem is over-determined. This means you can’t find an exact solution, but instead need to determine a best fit (or best approximation) of the system’s parameters. Best fit is a very important problem in practical mathematics, and computer science. So important, in fact, that it is surprising you usually don’t brush into it until higher level mathematics courses. This could be used, for example, to perform cloud-to-cloud registration of point cloud data.
The method I will outline in this tutorial is adopted from this paper by Horn, Hilden, and Negahdaripour. This method at its core follows a few simple steps.
- Compute the centroid of each point set.
- Use the centroid to compute optimal rotation.
- Compute the scale.
- Use the optimal rotation and scale to compute translation.
That’s it! Now that’s not to say the process isn’t complicating, but I find it helps to start by trying to understand at the highest level concepts involved.
Simplifying the Problem
If you take the centroid of each point sets (made up of correspondences) then it makes sense that the centroid itself would also be a correspondence. We can then offset the points in each set with respect to their centroid, effectively moving the origin of the transformation. This allows us to eliminate translation since each set is now in a common origin. We could actually do this step with any arbitrary point, but it helps to avoid bias by using the centroid. For a math-ier explanation:
As you can see, by formulating the problem offset from the centroid, the translation conveniently falls out. It’s not as obvious, but scale also falls out of the equation. Mathematically, the reason behind this will be more obvious later. Conceptually, this is due to the fact rotation matrices are orthonormal. This simply means they preserve scale when applied to a vector. This allows you to solve for a directional part (rotation), and a length part (scale). For now, let’s put off the length part, and focus on the directional part: our rotation matrix. Once we have our optimal rotation, we can then solve for scale, and translation.
Optimal Rotation – Where do we begin?
How are we going to find our “optimal rotation” between two sets of ‘n’ points? We can start with a simple expression involving the rotation matrix.
Let e = standard basis, m = rotated orthogonal basis, R = rotation
Here we can see the rotation matrix is related to the sum of the outer product of our rotated axes with the standard basis axes. I’m going to pause here to emphasize that this expression is actually a few modifications removed from the true solution for optimal rotation. For this reason I’m going to try to expand on this point to see if we can make the necessary conceptual leap a little more obvious. The outer product is a large part of the above expression, so let’s start there.
The outer product is the conjugate of the dot product. Where the dot product returns a scalar, the outer product returns a matrix. As it turns out, the outer product is fairly closely related to the dot/cross product. For information about this, I’d recommend checking this very helpful website. As it turns out, the off-diagonal entries encode the cross product, while the diagonal entries encode the dot product between the two vectors. As a result of this, you can see summing the outer product of a vector set as summing simultaneously the cross/dot product. By normalizing this cross product sum, we are left with an average rotation axis. Getting the rotation angle out of this is a little harder to see, and will be left as an exercise for the reader (I don’t feel like deriving it right now). Essentially it amounts to using the ratio of the length of the cross product and our dot product to determine the angle.
Another way you can look at the outer product is as a covariance matrix. Covariance in simple terms is a way of measuring how two variables are related. Let’s say you have a variables ‘a’, and ‘b’.
- If the covariance of a, b is positive, then the two variables show similar behavior. When a is large, b is large. When a is small, b is small.
- If the covariance of a, b is negative, then the two variables show opposite behavior. When a is small, b is large. When a is large, b is small.
- The magnitude of the covariance is a bit harder to pin down, but for this discussion let’s just say it shows how strong the relationship (positive or negative) our variables share.
With this new understanding of the outer product, we can see each column as being a covariance between our rotated vector, and our corresponding non-rotated component (x, y, or z). Put simply, each component of our non-rotated point (p) figures out which direction pulls it the strongest (q).
Stopping there, this sounds an awful lot like a rotation matrix. After all, a rotation matrix is something that fundamentally expresses which direction the x, y, and z parts of a vector should point toward. If you have a 90 degree rotation about the z-axis, your rotation matrix expresses that the point’s x component should move toward the y direction, y component should move toward the -x direction, and z should stay the same.
This is the basic idea behind our formulation of the rotation matrix found above. Our rotation matrix here is essentially a covariance matrix indicating which direction x, y, and z should move.
Get to the Point – Optimal Rotation?
In order to find the optimal rotation it is just a simple tweak to move from using 3 orthonormal basis vectors, to an arbitrary number of vectors.
The only problem now, is that our rotation matrix will no longer be orthonormal (and thus no longer a rotation matrix). To solve this problem we simply have to perform an SVD on matrix R’. The singular value decomposition has the powerful understanding associated with it that you are decomposing your matrix into 3 transformations. 2 transformations of rotation, and 1 of scale. By throwing out the scale transformation, we are left with an orthonormal rotation matrix that best approximates our rotation matrix. This is also why, as was mentioned above, our optimal rotation is invariant to the scale of the involved vectors.
Scale is an operation which uniformly manipulates vector lengths. This means we can determine scale as a ratio of the our vector lengths. We’ll start the derivation using the average length, but as you will see we only end up needing to accumulate the length sum for each point set.
If you want, you can also solve for the squared scale, and then apply the square root at the end of the accumulation process.
Finally, by using the equation defined above, it is a simple step to solve for our translation using our centroids, scale, and rotation.
You can find the complete example code on github. Here is the important section.
template <class T>
const T* pPoints0, const T* pPoints1,
size_t numOfPoints, Params& rParams)
// We need a minimum number of 3 points
if (numOfPoints < 3)
else if (numOfPoints == 3)
// Use minimal implementation
Vector3d centroid0 = Vector3d::Zero();
Vector3d centroid1 = Vector3d::Zero();
for (size_t i = 0; i < numOfPoints; ++i)
centroid0 += pPoints0[i].cast<double>();
centroid1 += pPoints1[i].cast<double>();
const double invNumOfPoints = 1.0 / (double)numOfPoints;
centroid0 *= invNumOfPoints;
centroid1 *= invNumOfPoints;
Matrix3d matrixSum = Matrix3d::Zero();
double sqrNormSum0 = 0.0, sqrNormSum1 = 0.0;
for (size_t i = 0; i < numOfPoints; ++i)
const Vector3d offsetPoint0 = (pPoints0[i].cast<double>() - centroid0);
const Vector3d offsetPoint1 = (pPoints1[i].cast<double>() - centroid1);
sqrNormSum0 += offsetPoint0.squaredNorm();
sqrNormSum1 += offsetPoint1.squaredNorm();
matrixSum += offsetPoint1 * offsetPoint0.transpose();
const JacobiSVD<Matrix3d> svd(
matrixSum, ComputeFullU | ComputeFullV);
const Matrix3d rotation = svd.matrixU() * svd.matrixV().transpose();
const double scale = sqrt(sqrNormSum1 / sqrNormSum0);
const Eigen::Vector3d translation = centroid1 - scale * rotation * centroid0;
rParams.mRotation = Quaterniond(rotation);
rParams.mTranslation = translation;
rParams.mScale = scale;
And that’s it! Hopefully that helps to explain a little better the math, and logic behind the task of fitting an optimal transformation. If there are any questions, or corrects, feel free to put them in the comments.