Technical details

Most of the software is written in C, with additional parts in 32-bit (Intel) assembly language. Using assembly for the time critical parts rather than plain C accounts for a speed gain of up to 500%.
For the class record algorithm a new program was developed in 2011. This is written in C# and is optimized towards 64-bits. The algorithm described below is that of the newer version. The differences with the older algorithm are indicated by the indication (32-bits:).

Since searches for convergence, glides and path records are several orders of magnitude faster than searches for class records two different algorithms are used.

The convergence algorithm
The algorithm that checks for convergence, glides and path records uses a sieve: only those numbers are checked that can not be shown to either converge or join the path of a lower number within a few steps. The gain from this method is substantial. Using a sieve of 65536 (216) for instance leaves only 1720 out of every 65536 numbers to be checked. Currently a sieve of 232 is used which means the fraction of numbers that can be skipped is over 99.2%. Apart from that the number is checked to see whether it may occur in the path of a lower number: all numbers that are congruent 2, 4, 5 or 8 mod 9 can easily be shown to occur in such a path and can therefore be skipped as well. This accounts for 44.4% of all remaining numbers.
To speed up the search even further the code uses an "interlaced" technique, searching 212 numbers of the same congruence class modulo 232. This means the sieve can be calculated rather than stored and also enables simultaneous calculation of the iteration path up to the 32nd division for the entire congruence class. This results in a further time reduction of approximately 50%.
The class record algorithm
In the algorithm that searches for high delays several techniques are combined to speed up calculations. The main reduction is achieved by using a sieve to eliminate many numbers beforehand. Different from the one above the sieve in this case checks whether paths come together rather than for straightforward convergence. If two paths join then obviously the upper one can never yield a class record and can be skipped. Currently the sievesize used is 220 (32-bits:218) elements wide, which in effect makes calculations of around 87% (32-bits:86%) of all numbers unnecessary.
Example: any number of the form 8k+4 will reach 6k+4 after 3 steps, and so will any number of the form 8k+5. Therefore no number of the form 8k+5 can be a class record (5 itself being the only exception). Numbers of the form 8k+5 therefore do not need to be checked, and all positions of the form 8k+5 in the sieve contain a zero.

Even numbers and all numbers congruent 2 modulo 3 are skipped as well, since potential class records of this type can easily be derived from other, earlier class records, because any number of the form 2n + 1 can be seen to result in a number of the form 3n + 2 after two steps. Similarly numbers of the form 9n + 4 can be seen to be skipped, because any class record of the form 8n + 3 will result in such a number after five steps (32-bits:no 9n+4 checks are performed).
Combining these last ideas with the sieve the calculations of around 93% (32-bits:91%) of all numbers can be shown to be redundant, effectively speeding up the calculations by a factor 14 (32-bits:11).

A cut-off technique is also applied at several points to either ensure a number is still a candidate for a high delay or else to terminate further processing of this number. In most cases this implies the calculation can be cut short after only a fraction of its trajectory.
To see that this technique works assume that two consecutive Delay Records (N1 and N2) have delays D1 and D2 respectively. Once a number drops below N2 its delay can not be higher than D1 + the current iteration count. If this sum is below a certain threshold (e.g. the delay of the lowest class record still missing) processing this number can be abandoned.
Which delay records are used obviously depends on the exact numbers and in fact different intervals require different cut-off points to obtain optimal performance. During the current interval (where starting numbers are approximately 257) trajectories are checked against the delay records having delays of 1820, 1549, 1408, 1307 and 1087.
The performance gain of this technique depends largely on the magnitude of the numbers concerned, but has been empirically determined to be around a factor 10.

Back to the general 3x+1 page.