Longest Collatz Sequence
Define the Collatz map C: Z^+ -> Z^+ by C(n) = n/2 & if 2 | n, 3n + 1 & if 2 nmid n. The Collatz sequence from n_0 is (n_0, C(n_0), C^2(n_0),...). The chain length L(n) is the number of terms in th...
Problem Statement
This archive keeps the full statement, math, and original media on the page.
The following iterative sequence is defined for the set of positive integers: $$\begin{cases} n \to n/2 (n \text{ is even}) \\ n \to 3n + 1 (n \text{ is odd}) \end{cases}$$ Using the rule above and starting with $13$, we generate the following sequence: $$13 \to 40 \to 20 \to 10 \to 5 \to 16 \to 8 \to 4 \to 2 \to 1.$$ It can be seen that this sequence (starting at $13$ and finishing at $1$) contains $10$ terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at $1$.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
Problem 14: Longest Collatz Sequence
Mathematical Development
Definitions
Definition 1. The Collatz orbit of is the sequence . The stopping time (or chain length) is .
Definition 2. For , define with .
Theorems
Theorem 1 (Recursive structure). , and for :
Proof. The Collatz sequence from is . Removing the first term yields the sequence from , which has terms. Hence .
Lemma 1 (Odd-step compression). For odd , is even, and
Consequently, for odd .
Proof. If is odd, then . Since is odd and is odd, is even, so . Two applications of yield , and by Theorem 1 applied twice: .
Theorem 2 (Termination). The Collatz conjecture asserts for all . While unproven in general, it has been verified computationally for all (far exceeding ).
Proof. By computational verification (Oliveira e Silva, 2009, extended by subsequent authors).
Theorem 3 (Correctness of memoized search). Let store for . The following procedure correctly computes for all : starting from , follow the Collatz map until reaching some with ; then , where is the number of steps taken.
Proof. By Theorem 1, where is the first iterate with a known chain length. Since is a function of alone (assuming termination), the cached value is correct, and thus . Processing in order ensures that is available as a base case, and each orbit eventually reaches a value (since the orbit must pass through values below before reaching ) or exits the cache range and terminates at directly.
Lemma 2 (Orbit eventually descends). For any , the Collatz orbit from eventually reaches a value strictly less than .
Proof. If is even, . If is odd, , but . While this may exceed , the orbit (under the assumption of termination verified for ) must reach . More precisely, each even step halves the value, while odd steps multiply by at most (after compression). Since , a sequence of odd-then-even steps is contractive on average, guaranteeing eventual descent.
Editorial
We scan every starting value below while memoizing Collatz chain lengths that have already been resolved. For each start, we follow the sequence until we hit a value with known length, count how many new steps were taken, and store the resulting length for the starting value before comparing it with the current best. This is sufficient because every candidate under is examined and each evaluation eventually terminates at a cached chain tail.
Pseudocode
Algorithm: Longest Collatz Chain Below a Bound
Require: An integer N ≥ 2.
Ensure: The starting value below N that produces the longest Collatz chain.
1: Initialize a memo table with length(1) ← 1, and set best_start ← 1 and best_len ← 1.
2: For each start in {2, 3, ..., N - 1}, follow the Collatz map until a value with known chain length is reached, recording the unresolved trajectory.
3: Propagate chain lengths backward along that recorded trajectory and store them in the memo table.
4: If length(start) > best_len, update best_len ← length(start) and best_start ← start.
5: Return best_start.
Complexity Analysis
Proposition. The algorithm runs in worst-case time and space, where is the average number of steps before hitting a cached value.
Proof (Space). The cache array has entries, each storing one integer. This dominates all other storage. Hence space is .
Proof (Time). For each starting value , the while-loop runs until the orbit hits a cached value . Let denote the number of iterations for starting value . The total work is . In the worst case, each could be as large as , giving . However, with memoization, once is set, all future orbits passing through terminate immediately. Empirically, for , the total number of Collatz steps across all starting values is approximately , consistent with amortized behavior.
The heuristic can be motivated as follows: the expected number of steps to halve a random integer’s magnitude is (each even step halves, each odd-even pair multiplies by after two steps). Thus on average, giving total work .
Answer
Code
Each problem page includes the exact C++ and Python source files from the local archive.
#include <bits/stdc++.h>
using namespace std;
int main() {
const int N = 1000000;
vector<int> cache(N + 1, 0);
cache[1] = 1;
for (int i = 2; i < N; i++) {
long long n = i;
int steps = 0;
while (n >= N || cache[n] == 0) {
n = (n % 2 == 0) ? n / 2 : 3 * n + 1;
steps++;
}
cache[i] = steps + (int)cache[n];
}
int best = 1;
for (int i = 2; i < N; i++)
if (cache[i] > cache[best])
best = i;
cout << best << endl;
return 0;
}
"""Project Euler Problem 14: Longest Collatz Sequence"""
def solve():
LIMIT = 1_000_000
cache = [0] * (LIMIT + 1)
cache[1] = 1
for i in range(2, LIMIT):
n, steps = i, 0
while n >= LIMIT or cache[n] == 0:
n = n // 2 if n % 2 == 0 else 3 * n + 1
steps += 1
cache[i] = steps + cache[n]
return max(range(1, LIMIT), key=lambda x: cache[x])
answer = solve()
assert answer == 837799
print(answer)