Browsed by
Category: Tutorial

Geometric Transformation Fitting

Geometric Transformation Fitting

Problem Definition

Given the following equation and an arbitrary number of point correspondence determine scale, rotation, and translation.

(1)   \begin{equation*} q = sRp + t \end{equation*}


Imagine you are sitting at home doing a bit of programming. Suddenly a rogue mathematician kicks in your door, and presents you with two sets of point data (spanning millions of points), demanding that you find a sufficient relative transformation between them. He assures you that:

  1. The points are linearly independent.
  2. You will have beforehand the correspondence of points between set p and set q.

If you don’t comply, the mathematician threatens to refactor portions of your code without consulting you first (the horror!). How do you accomplish the task?

The answer to this question seems like it should be easy. The difficulty comes from the fact the problem is over-determined. This means you can’t find an exact solution, but instead need to determine a best fit (or best approximation) of the system’s parameters. Best fit is a very important problem in practical mathematics, and computer science. So important, in fact, that it is surprising you usually don’t brush into it until higher level mathematics courses. This could be used, for example, to perform cloud-to-cloud registration of point cloud data.

The method I will outline in this tutorial is adopted from this paper by Horn, Hilden, and Negahdaripour. This method at its core follows a few simple steps.

  1. Compute the centroid of each point set.
  2. Use the centroid to compute optimal rotation.
  3. Compute the scale.
  4. Use the optimal rotation and scale to compute translation.

That’s it! Now that’s not to say the process isn’t complicating, but I find it helps to start by trying to understand at the highest level concepts involved.

Simplifying the Problem

If you take the centroid of each point sets (made up of correspondences) then it makes sense that the centroid itself would also be a correspondence. We can then offset the points in each set with respect to their centroid, effectively moving the origin of the transformation. This allows us to eliminate translation since each set is now in a common origin. We could actually do this step with any arbitrary point, but it helps to avoid bias by using the centroid. For a math-ier explanation:

(2)   \begin{equation*} \bar{q} = sR\bar{p} + t \rightarrow t = \bar{q} - sR\bar{p} \end{equation*}

(3)   \begin{equation*} q = sRp + t \rightarrow q = sRp + \bar{q} - sR\bar{p} \rightarrow q - \bar{q} = sR(p - \bar{p}) \end{equation*}

As you can see, by formulating the problem offset from the centroid, the translation conveniently falls out. It’s not as obvious, but scale also falls out of the equation. Mathematically, the reason behind this will be more obvious later. Conceptually, this is due to the fact rotation matrices are orthonormal. This simply means they preserve scale when applied to a vector. This allows you to solve for a directional part (rotation), and a length part (scale). For now, let’s put off the length part, and focus on the directional part: our rotation matrix. Once we have our optimal rotation, we can then solve for scale, and translation.

Optimal Rotation – Where do we begin?

How are we going to find our “optimal rotation” between two sets of ‘n’ points? We can start with a simple expression involving the rotation matrix.

Let e = standard basis, m = rotated orthogonal basis, R = rotation

(4)   \begin{equation*} m = R e \rightarrow R = m e^T \rightarrow R = m_1 e_1^T + m_2 e_2^T + m_3 e_3^T \end{equation*}

Here we can see the rotation matrix is related to the sum of the outer product of our rotated axes with the standard basis axes. I’m going to pause here to emphasize that this expression is actually a few modifications removed from the true solution for optimal rotation. For this reason I’m going to try to expand on this point to see if we can make the necessary conceptual leap a little more obvious. The outer product is a large part of the above expression, so let’s start there.

Outer Product

The outer product is the conjugate of the dot product. Where the dot product returns a scalar, the outer product returns a matrix. As it turns out, the outer product is fairly closely related to the dot/cross product. For information about this, I’d recommend checking this very helpful website. As it turns out, the off-diagonal entries encode the cross product, while the diagonal entries encode the dot product between the two vectors. As a result of this, you can see summing the outer product of a vector set as summing simultaneously the cross/dot product. By normalizing this cross product sum, we are left with an average rotation axis. Getting the rotation angle out of this is a little harder to see, and will be left as an exercise for the reader (I don’t feel like deriving it right now). Essentially it amounts to using the ratio of the length of the cross product and our dot product to determine the angle.

Another way you can look at the outer product is as a covariance matrix. Covariance in simple terms is a way of measuring how two variables are related. Let’s say you have a variables ‘a’, and ‘b’.

  • If the covariance of a, b is positive, then the two variables show similar behavior. When a is large, b is large. When a is small, b is small.
  • If the covariance of a, b is negative, then the two variables show opposite behavior. When a is small, b is large. When a is large, b is small.
  • The magnitude of the covariance is a bit harder to pin down, but for this discussion let’s just say it shows how strong the relationship (positive or negative) our variables share.

(5)   \begin{equation*} PQ^T = \begin{bmatrix} p_xq_x & p_xq_y & p_xq_z \\ p_yq_x & p_yq_y & p_yq_z \\ p_zq_x & p_zq_y & p_zq_z \end{bmatrix} \end{equation*}

With this new understanding of the outer product, we can see each column as being a covariance between our rotated vector, and our corresponding non-rotated component (x, y, or z). Put simply, each component of our non-rotated point (p) figures out which direction pulls it the strongest (q).

Stopping there, this sounds an awful lot like a rotation matrix. After all, a rotation matrix is something that fundamentally expresses which direction the x, y, and z parts of a vector should point toward. If you have a 90 degree rotation about the z-axis, your rotation matrix expresses that the point’s x component should move toward the y direction, y component should move toward the -x direction, and z should stay the same.

(6)   \begin{equation*} R = m_1 e_1^T + m_2 e_2^T + m_3 e_3^T \end{equation*}

This is the basic idea behind our formulation of the rotation matrix found above. Our rotation matrix here is essentially a covariance matrix indicating which direction x, y, and z should move.

Get to the Point – Optimal Rotation?

In order to find the optimal rotation it is just a simple tweak to move from using 3 orthonormal basis vectors, to an arbitrary number of vectors.

(7)   \begin{equation*} R' = \sum_{i=1} a_i  b_i^T \end{equation*}

The only problem now, is that our rotation matrix will no longer be orthonormal (and thus no longer a rotation matrix). To solve this problem we simply have to perform an SVD on matrix R’. The singular value decomposition has the powerful understanding associated with it that you are decomposing your matrix into 3 transformations. 2 transformations of rotation, and 1 of scale. By throwing out the scale transformation, we are left with an orthonormal rotation matrix that best approximates our rotation matrix. This is also why, as was mentioned above, our optimal rotation is invariant to the scale of the involved vectors.

(8)   \begin{equation*} SVD(R') = U\Sigma V^T \rightarrow R = UV^T \end{equation*}

Optimal Scale

Scale is an operation which uniformly manipulates vector lengths. This means we can determine scale as a ratio of the our vector lengths. We’ll start the derivation using the average length, but as you will see we only end up needing to accumulate the length sum for each point set.

(9)   \begin{equation*} l_{ave} = \dfrac{\sum_{i=1}^n \|x_i\|}{n} \end{equation*}

(10)   \begin{equation*} s = \dfrac{l_{p,ave}}{l_{q,ave}} = \dfrac{\sum_{i=1}^n \|p_i\|}{n} \dfrac{n}{\sum_{i=1}^n \|q_i\|} = \dfrac{\sum_{i=1}^n \|p_i\|}{\sum_{i=1}^n \|q_i\|} \end{equation*}

If you want, you can also solve for the squared scale, and then apply the square root at the end of the accumulation process.

Optimal Translation

Finally, by using the equation defined above, it is a simple step to solve for our translation using our centroids, scale, and rotation.

(11)   \begin{equation*} t = \bar{q} - sR\bar{p} \end{equation*}


You can find the complete example code on github. Here is the important section.


And that’s it! Hopefully that helps to explain a little better the math, and logic behind the task of fitting an optimal transformation. If there are any questions, or corrects, feel free to put them in the comments.


Helpful Resources
Connecting Android Device to Winsock via Bluetooth

Connecting Android Device to Winsock via Bluetooth

I am writing this tutorial in order to help others who find themselves in a similar position to my own. I recently had the task of connecting an android 4.3 device to a windows machine relying entirely on winsock. The more I researched the issue, the more I realized that there simply aren’t many good resources when it comes to this very specific problem I was attempting to solve. As a result, I am going to attempt to outline here the best explanation I can as to how I understand solving this problem. In addition, I will omit things as I see fit (usually with reference to an outside source). Others may or may not find this post helpful (It’s entirely possible this post will end up one more among many others which fail to convey the idea).

Android 4.3 – The following was tested using Android 4.4.4, and 4.3.1

The android side of this project is actually fairly simple. You can find great documentation, and example code on the official website which make this part of the project very doable.

As a result of this, I suggest you take a look and work out the basics of Android bluetooth.

To summarize, however, you want to do the following:

  1. Modify application permissions to allow “android.permission.BLUETOOTH” or “android.permission.BLUETOOTH_ADMIN” as per your needs
  2. Acquire ‘BluetoothAdapter’
  3. Pair with a device
  4. Secure a ‘BluetoothSocket’ connection with the paired device

There are, however, a few tips I would like to add to make this process easier.

Tip 1: Fetch SDP Records – If you android app is having trouble discovering service records on the target device, be sure to call “fetchUuidWithSdp()” on your targetted ‘BluetoothDevice’. I have found this step necessary on at least 3 android devices, and yet it is rarely mentioned on the tutorial page.

Tip 2: COMMON UUID – If you find that your two devices just simply aren’t communicating then be sure to check that your android application UUID string matches your desktop application’s GUID. This is crucial.

Winsock 2.2 – Server Side

In order to create a bluetooth server utilizing winsock we begin by initializing our winsocket context. “MAKEWORD(2,2)” simple instructs winsock to utilize version 2.2 of the library, as opposed to version 1.0.

The next couple of steps are almost identical to how winsock is typically used. We will begin by creating our listener socket, which will be used later to wait on incoming connections. The listener socket MUST be created with the following parameters supplied.

After this will we want to create our bluetooth socket address, and fill it with the relevant information. Since our desktop application will behave as a server, we will fill in the following information. If you want your desktop to behave as a client, I suggest you look here.

Next we will bind our socket, and set it to a listening status.

Now that the socket has been bound, we can acquire more in-depth information which will be used a little bit later. This step may be unnecessary for some situations, but it is likely safest to do it anyways.

After all of this has been finished, we can begin what I consider to be the most complicating step involved in this process. In order for bluetooth devices spanning various platforms to communicate with one another it is necessary for them to share a common understanding of “the connection”. Bluetooth accomplishes this primarily with “Service Discovery Protocol (SDP) Records”. This is the bluetooth device’s way of telling another bluetooth device “Hey! I’m can do these things!” or “I can make these connections!”. The most important thing to know is that both your android application, and winsock application both have to have the same 128-bit GUID in their SDP records.

Above I used the well known GUID for a bluetooth serial board, but you can use any 128-bit hex value you want. Preferably pick something that doesn’t overlap with existing devices.

The end is in our sights at this point. Now we need to fill in our ‘CSADDR_INFO’ structure in order to bridge the local connection with our service record. Notice here that our address info structure refers back to our “SOCKADDR_BTH” we used earlier during the binding stage.

Now that we have all of this finished, we can initialize our service, and set our socket to listen mode.

At this point your program will lock up, and wait for a connection to be secured before resuming. This concludes all the critical steps when it comes to setting up your bluetooth server.

There are a few more functions that you may need to/want to play around with depending on your specific bluetooth adapter. In order to modify local bluetooth adapter settings programmatically you can make use of the functions listed here.

In order to find your local bluetooth device (also known as a ‘radio’) you can make use of the following code:

This will locate the first possible bluetooth device on your system. If there are multiple bluetooth devices which you need to iterate over, you can use the following code.

You can also alter some convenience settings on your local device. For example,

The use of each of these functions is pretty self-explanatory.


Bluetooth Version of Winsock Functions

Bluetooth Functions

You can find my source code here.

Note: If you have any questions or constructive criticism I would be happy to hear it.

Move Construction Explained

Move Construction Explained

C++11 brought us a whole host of useful features, and expansions on STL. One such feature is the introduction of “move semantics” to the C++ language. Put simply, move operations are the ability to “move” a resource (memory, file handle, etc.) instead of copying it. Imagine you are back in college, and missed an important class. To avoid falling behind, you ask a friend if you can copy their notes. As it turns out, your friend took the course last semester and no longer has a need for them. Instead of going through the trouble of copying them, he simply agrees to give you the original notes. In effect, this is what you can achieve with move operations. However, there is an obvious problem here. You can only take the original notes from your friend if he no longer has need for them. Doing otherwise would be incredibly rude.

Back in the programming realm, how do we know if we are able to move a resource, as opposed to copying it? To better answer this question, it will be helpful to first understand the difference between L-values, and R-values.

R-Values vs L-Values

L-values are pieces of the equality statement which exist after the statement is evaluated.

R-values are pieces of the equality statement which do not exist after the statement is evaluated. In effect, R-values are temporary, L-values are not.

From what I’ve heard, the initially referred to “right-hand” value, and “left-hand” value. This indicated whether the value was on the left or right side of the equality statement. So, according to this, “x = 10;”, x would be our L-value, and 10 our R-value. While this definition no longer applies in modern C++, it can still serve as a mnemonic device to help you remember the difference. R-values are temporary to the statement they reside, so you would never see an R-value on the left side of an equality statement.

To answer my earlier question, we know we can move a resource as opposed to copy it if the resource comes from an R-value. After all, R-values are temporary. If we attempt to copy a temporary resource, then we may as well just move ownership instead.

Move Operations

I find the best way to learn is by re-implementing common STL functionality. For this example, we will re-implement an STL vector style collection class.

There we go, now we have the basics of a collection class. Now you might notice some interesting syntax here. Specifically:

This is what’s known as an “R-value reference”. In C++ it is how we can define a parameter overload as being an R-value vs an L-value. Using this syntax we can define our “move constructor” as well as our “move assignment operator”.

Move Constructor Implementation

Similarly to a copy constructor/assignment operator, our move constructor/assignment operator will have nearly identical implementation. They will both have two important pieces.

  1. First we want to transfer the resource from the R-value to the L-value.
  2. Second we want to invalidate the resource stored on the R-value. This is an important step, as we want to prevent the R-value from destructing the contained resource.

And that’s it! the call to ‘release()’ on the smart pointer sets the pointer to null and relinquishes ownership. The move assignment operator is nearly identical, with the exception of the possible need to destruct any existing resources on the L-value. An interesting note, contrary to a copy assignment operator it is not necessary with the move assignment operator to check the parameter’s pointer against the current object’s pointer. This makes sense after all.  It’s difficult to think of reasonable situation in which you could set an R-value equal to itself.

Why is this Important?

Move semantics are important because they allow you to entirely avoid superfluous resource acquisition/release. This means avoiding memory allocation, file access, mutex acquistion, reference counting, etc. It may not seem significant. After all, the compiler is pretty good when it comes to avoiding these situations if it can. That said, you should never rely on the compiler to do the smart thing. With just a little extra effort you can ensure that the compiler does the right thing in all situations involving copying temporaries.

I hope this has helped some of you, and if you have any questions please post them in the comments.