The Terras Theorem

One of the most important results on the 3x+1 problem is the Terras Theorem, originally stated by Riho Terras. Because it is such an important theorem for the 3x+1 problem this page is dedicated to sketching its approach and ideas. While some minimal mathematical background is probably necessary to follow this page none of the proofs require detailed knowledge of advanced techniques or deeper theorems.

Although all lemmas are proven and no steps are skipped or ignored the purpose of this page is to introduce the Terras Theorem to those new to the 3x+1 problem. Please note therefore that the emphasis is on making the theorem comprehensible rather than on 100% exact mathematics (but see this note for a more exact version).

The simplest way to summarize the theorem goes like this: Almost all numbers have finite Glide.

This is too simple really. Before we get back to a more exact version of the theorem let us first redefine the 3x+1 algorithm as follows:

3x + 1 algorithm, restated: For any positive integer N define a sequence where
S₀ = N and for all i > 0 S_i = S_i-1 / 2 if S_i is even S_i = ( S_i-1 * 3 + 1 ) / 2 if S_i is odd
And we will call the smallest value of i for which S_i < S₀ the Stopping time of N

Example :: Let N = 17; then S₀ = 17, S₁ = 26, S₂ = 13, S₃ = 20, S₄ = 10, S₅ = 5, etc.
And since S₂ is the first term < 17 the stopping time of 17 is 2.

The Terras theorem in simple words then becomes: Almost all numbers have finite Stopping Time.

Note that this redefinition has no consequences for the convergence of the series.

Next we define a parity vector for each number as follows:

Parity vector: For any number N the parity vector v(N) is defined as v_i(N) = S_i mod 2

Example :: For N = 17 we find v₀ = 1, v₁ = 0, v₂ = 1, v₃ = 0, v₄ = 0, v₅ = 1, etc.

The parity vector completely describes the way the 3x+1 iterates of N behave, since the numbers correspond directly to the criterion upon which the 3x + 1 algorithm is based.

We now state a simple lemma to increase our understanding of the parity vector :

Lemma 1 :: If N is a positive integer of the form a . 2^k + b (b < 2^k) then the first k elements of the parity vector are dependent on b only
Proof :: The proof is by induction on k. For k = 1 we have N = 2a + b with b either 0 or 1 and the lemma is true by the definition of the 3x + 1 algorithm Now assume the lemma is true for some k Let N = a . 2^k+1 + b (b < 2^k+1) Then if S₀ = N is even S₁ = a . 2^k + b / 2 and if S₀ = N is odd S₁ = 3a . 2^k + (3b+1) / 2 In either case the first k elements of v(S₁) depend on b only Therefore the first k+1 elements of N are independent of a as well This concludes the proof of the lemma.

Example :: Let N = 17 = 0 . 2⁶ + 17. The first 6 elements of the parity vector are (1,0,1,0,0,1). From Lemma 1 this is true for every number of the form a . 2⁶ + 17. Therefore these are also the first 6 elements of the parity vector of 81, 145, 209, 273, etc.

Another lemma about parity vectors is :

Lemma 2 :: Assume w_i (0 ≤ i < k) is a parity vector of length k Then some number N exists for which v_i(N) = w_i (0 ≤ i < k)
Proof :: The proof is again by induction on k. For k = 1 we have w₀ is either 0 or 1 and N is either any even or odd number Now assume the lemma is true for some k Let w_i (0 ≤ i < k+1 ) be some parity vector of length k+1 Then some number N exists for which v_i(N) = w_i (0 ≤ i < k) From Lemma 1 we can write N as a . 2^k + b (b < 2^k) Now take two numbers M₁ and M₂, with M₁ = a . 2^k+1 + b and M₂ = a . 2^k+1 + 2^k + b For both M₁ and M₂ we automatically have v_i(M₁) = v_i(M₂) = w_i (0 ≤ i < k) Now take S₀ = M₁ and let us call S_k(M₁) = X From the definition of the algorithm we have S_k(M₂) = Y = X + 3ⁿ with n the number of odd elements of w_i from 0 to k Since 3ⁿ is always odd we see that X and Y are of different parity Therefore w_k+1(M₁) ≠ w_k+1(M₂) which means that for either w_k+1 = 0 or w_k+1 = 1 a number M must exist for which v_i(M) = w_i (0 ≤ i < k+1) This concludes the proof of the lemma.

Example :: Take the parity vector (1,0,1,0,0,1,x) with x either 0 or 1. For N = 17 these are the first 6 elements of its parity vector. With S₀ = N = 17 we find S₆ = 8. So for N = 17 the first 7 elements of its parity vector are (1,0,1,0,0,1,0). If we take S₀ = N = 17 + 2⁶ = 81 then from Lemma 2 we know that the first 7 elements of its parity vector must be (1,0,1,0,0,1,1) and therefore that S₆ must be odd, and indeed S₆ = 35. Note that the parity vector contains three 1's in its first 6 elements and that S₆ is also necessarily equal to 8 + 3³ = 8 + 27 = 35.

The next step is to define a parity number of order k:

Parity number: For any number N the parity numbers P_k (k ≥ 1) are defined as P_k = ∑_i<k (v_i . 2ⁱ)

Example :: For N = 17 we find P₁ = 1, P₂ = 1, P₃ = 5, P₄ = 5, P₅ = 5, P₆ = 37, etc.

Note that the parity number of order k is really nothing else than the first k elements of the parity vector written in reverse order and interpreted as a binary number. Thus the parity vector of 17 starts as (1,0,1,0,0,1) and therefore P₆ = 37, where 37 written in binary is 100101. Note that for any parity number of order k we have 0 ≤ P_k < 2^k

We now reach an important Lemma:

Lemma 3 :: Let M and N be positive integers. Then P_k(M) = P_k(N) if and only if M ≡ N mod 2^k
Proof :: • From M ≡ N mod 2^k it follows that both can be written as M = x . 2^k + b and N = y . 2^k + b, with b identical for both M and N From Lemma 1 the first k elements of v depend on b only Therefore v_i(M) = v_i(N) for 0 ≤ i < k and so P_k(M) = P_k(N) • Let P_k(M) = P_k(N), and automatically v_i(M) = v_i(N) for 0 ≤ i < k Assume that M and N are congruent b1 and b2 mod 2^k respectively Then we have two numbers a . 2^k + b1 and a . 2^k + b2 which both give rise to the same parity vector of length k By Lemma 2 however for every parity vector w of length k a number Q exists for which v_i(Q) = w_i (0 ≤ i < k) Since there are 2^k different vectors of length k and since there are also 2^k different congruence classes mod 2^k it follows that b1 = b2 This completes the proof of Lemma 3

From Lemma 3 we can derive that every congruence class mod 2^k uniquely maps to a parity number, with both numbers lying between 0 and 2^k - 1. This function can therefore be seen as a permutation function.

Example :: Taking k=3 we find that 8x + 0 maps to parity number 0, 8x+1 maps to 5, etc. Written more concisely the numbers 8x + r (r=0, 1, 2, 3, 4, 5, 6, 7) map to 0, 5, 2, 3, 4, 1, 6, 7 respectively. We see that 1 maps to 5 and vice versa, but all other numbers map to themselves. Written even shorter in standard permutation notation the mapping for k=3 can be described by (1,5). Similarly the mapping for k=4 can be written as (1,5) (2,10) (9,13).

From the definition of the parity vector it should be clear that the elements describe the behaviour of the 3x + 1 sequence. In fact we can give a pretty good approximation for S_k from V_i (0 ≤ i < k). This is stated more explicitly in Lemma 4.

Lemma 4 :: Let S₀ = N be a positive integer and v_i its parity vector. Let d(a,b) = ∑ v_i (a ≤ i < b) Specifically, let d(k) be a shorthand for d(0,k) Then S_k ≈ T_k = S₀ . 3^d(k) . 2^-k (Or, in logarithmic form: ln(S_k) ≈ ln(T_k) = ln(S₀) + d(k).ln(3) - k.ln(2)) and lim (S_k - T_k) / S_k = 0 for S_k → ∞
Proof :: Since every element of v_i represents a (3x + 1) / 2 multiplication step and every zero represents a division step we immediately see that S_i+1 = (3S_i+1)/2 if v_i is 1 and S_i+1 = S_i/2 if v_i is 0 Thus S_k = T_k + R_k where R_k is equal to R_k = ∑ v_i . 3^d(i+1,k) / 2^k-i (0 ≤ i < k) Since lim (R_k / S_k) = 0 for S_k → ∞ this proves Lemma 4

Example :: Taking N = S₀ = 17 and k = 5 we saw that S₅ = 5. We find that T₅ = 17 . 3² . 2^-5 = 4.78125 which is less then 5% off the real value. For larger numbers the results are much closer, for instance taking N = 1,234,567 results in S₅ = 3,125,000 and T₅ = N . 3⁴ . 2^-5 = 3,124,997.7 where the difference between the two numbers is already less than 10^-6 . S₅

Note that the factor R_k is non-negative, implying that T_k ≤ S_k
Now that we have a fair way of estimating S_k from v(N) we can use v(N) to get a good indication of the convergence of N. To this end we introduce a new definition.

Convergent and Divergent vectors: Let v_i (0 ≤ i < k) be a parity vector of length k Let d(j) = ∑ v_i (0 ≤ i < j) and define c(j) as d(j).ln(3) - j.ln(2) for any j < k Then we call v convergent if c(j) < 0 for any j < k and the smallest value j for which c(j) < 0 is called the convergence time of v, or, generally, the convergence time of any N having such a parity vector Finally we call v divergent if such a value of j does not exist

It should be clear by Lemma 4 that a necessary though not quite sufficient condition for N to have a stopping time k is that its parity vector has Convergence Time k.
We are now getting closer to the real Terras theorem. The next lemma takes a big step towards the final goal.

Lemma 5 :: Let v_i be a parity vector of length k with convergence time k and let M be the set of integers having parity vector v Then all sufficiently large elements N from M have stopping time k
Proof :: First we note that Lemma 2 asserts that some N exist, so M is not empty By Lemma 3 we know that all elements from M fall into a single congruence class mod 2^k We can easily see from Lemma 4 that S₀ < T_i ≤ S_i (0 ≤ i < k-1) Therefore N must have a stopping time of at least k Now the limit ( S_k ) / S₀ for S₀ → ∞ is equal to (S₀ . e^c(k) + R_k) / S₀ = e^c(k) Since c(k) < 0 obviously e^c(k) < 1 which proves the lemma.

Example :: All numbers ≡ 3 mod 16 (x ≥ 0) have a parity vector v starting (1,1,0,0). The convergence time of v is 4. Therefore all sufficiently large numbers ≡ 3 mod 16 have stopping time 4. The smallest such number, 3, is apparently sufficiently large enough already, since the stopping time of 3 is indeed 4.

Lemma 5 effectively states that only finitely many numbers with convergence time k do not have stopping time k. In actual fact no number is known with this property other than the trivial case where N = 1. Since such a number should (by Lemma 5) be relatively small it is generally believed that no such number exists. In other words:

Conjecture :: For all N > 1 the convergence time is equal to the stopping time

We have one more lemma to go before returning to the Terras Theorem proper. But this lemma takes us really close to our final destination.

Lemma 6 :: Let V_k be the set of all parity vectors v of length k. Let W_k be the subset of V consisting of all divergent elements of V. Then (taking |C| to be the number of elements of C) lim k → ∞ |W_k| / |V_k| = 0.
Proof :: From the definition of convergent vectors it should be clear that c(k) < 0 is a sufficient though not necessary condition for convergence. Since the distribution of 1's and 0's in all parity vectors can be described by the binomial distribution, the limit above is obviously bounded by lim k → ∞ ∑ 2^-k . k! / i!(k-i)! for 0 ≤ i < kα where α = ( 1 - ln(2) / ln(3) ) This limit simply represents a part of the binomial distribution, and it is well known that the limit is equal to zero for any α less than ½. Since ln(2) / ln(3) > ½ the condition is fulfilled and the Lemma is proven.

Example :: The table below depicts both the number of divergent parity vectors and the relative number of those vectors for a number of values of k. Although the number of divergent parity vectors increases rapidly their relative number can be seen to drop substantially even for fairly modest values of k. For instance for k=70 the fraction of divergent vectors is approximately 0.001, which implies that 99.9% of all parity vectors with length 70 are convergent.

k	\|W_k\|	\|W_k\| / \|V_k\|
1	1	0.5000
2	1	0.2500
3	2	0.2500
4	3	0.1875
5	4	0.1250
6	8	0.1250
7	13	0.1016
8	19	0.0742
9	38	0.0742
10	64	0.0625

k	\|W_k\|	\|W_k\| / \|V_k\|
11	128	0.0625
12	226	0.0552
13	367	0.0448
14	734	0.0448
15	1295	0.0395
16	2114	0.0323
17	4228	0.0323
18	7495	0.0286
19	14990	0.0286
20	27328	0.0261

k	\|W_k\|	\|W_k\| / \|V_k\|
30	1.28e+007	0.01189418
40	6.40e+009	0.00582334
50	3.73e+012	0.00331669
60	2.22e+015	0.00192219
70	1.24e+018	0.00105159
80	8.03e+020	0.00066440
90	5.09e+023	0.00041078
100	3.03e+026	0.00023868
110	2.03e+029	0.00015662
120	1.31e+032	0.00009879

k	\|W_k\|	\|W_k\| / \|V_k\|
130	8.11e+034	0.00005957
140	5.52e+037	0.00003963
150	3.67e+040	0.00002569
160	2.52e+043	0.00001726
170	1.64e+046	0.00001093
180	1.13e+049	0.00000736
190	7.69e+051	0.00000490
200	4.92e+054	0.00000306
250	7.41e+068	0.00000041
300	1.11e+083	0.00000005

We finally return to the Terras theorem itself. Stated more accurately than above the essence of the theorem is that the fraction of numbers within a given interval that can not be shown to have a finite stopping time tends to zero as the interval expands. In other words :

Theorem (Terras) :: Let M be a positive integer. Let D(M) be the fraction of numbers < M that do not have finite stopping time. Then the limit of D(M) for M → ∞ is equal to 0.
Proof :: First we establish that from Lemma 5 only finitely many numbers have a stopping time different from their convergence time. Now partition the numbers into congruence classes mod 2^k From Lemma 3 every number ≡ m mod 2^k maps to a unique parity number n ( 0 ≤ n < 2^k ) and therefore to a unique parity vector of length k. But Lemma 6 proved that the fraction of divergent parity vectors of length k for k → ∞ is zero. On considering numbers N < 2^k that do not have stopping time ≤ k it follows that lim k → ∞ D(2^k) = 0. This concludes the proof of the Theorem.

Note : The above version of the Terras Theorem is a simplified version. It can be improved in several ways, e.g. by stating exactly 'how fast' the limit goes to zero. A more formal and more complete proof can be found in this excellent standard overview of the 3x+1 problem by Jeffrey Lagarias.

Back to the general 3x+1 page.