All Euler problems
Project Euler

Collatz Prefix Families

The Collatz function is defined by c(n) = n/2 if n is even, and c(n) = 3n + 1 if n is odd. A Collatz prefix family of length k is the set of all starting values in [1, N] whose Collatz sequences sh...

Source sync Apr 19, 2026
Problem #0494
Level Level 32
Solved By 265
Languages C++, Python
Answer 2880067194446832666
Length 472 words
modular_arithmeticsequencenumber_theory

Problem Statement

This archive keeps the full statement, math, and original media on the page.

The Collatz sequence is defined as: \(a_{i+1} = \left \{ \large {\frac {a_i} 2 \atop 3 a_i+1} {\text {if }a_i\text { is even} \atop \text {if }a_i\text { is odd}} \right .\).

The Collatz conjecture states that starting from any positive integer, the sequence eventually reaches the cycle \(1,4,2,1, \dots \).

We shall define the sequence prefix \(p(n)\) for the Collatz sequence starting with \(a_1 = n\) as the sub-sequence of all numbers not a power of \(2\) (\(2^0=1\) is considered a power of \(2\) for this problem). For example: \begin {align*} \begin {array}{l} p(13) = \{13, 40, 20, 10, 5\} \\ p(8) = \{\} \end {array} \end {align*}

Any number invalidating the conjecture would have an infinite length sequence prefix.

Let \(S_m\) be the set of all sequence prefixes of length \(m\). Two sequences \(\{a_1, a_2, \dots , a_m\}\) and \(\{b_1, b_2, \dots , b_m\}\) in \(S_m\) are said to belong to the same prefix family if \(a_i < a_j\) if and only if \(b_i < b_j\) for all \(1 \le i,j \le m\).

For example, in \(S_4\), \(\{6, 3, 10, 5\}\) is in the same family as \(\{454, 227, 682, 341\}\), but not \(\{113, 340, 170, 85\}\).

Let \(f(m)\) be the number of distinct prefix families in \(S_m\).

You are given \(f(5) = 5\), \(f(10) = 55\), \(f(20) = 6771\).

Find \(f(90)\).

Problem 494: Collatz Prefix Families

Mathematical Foundation

Theorem 1 (Determinism of parity patterns). Let p=(p0,p1,,pk1){0,1}k\mathbf{p} = (p_0, p_1, \ldots, p_{k-1}) \in \{0, 1\}^k be a binary string where pip_i encodes the parity of c(i)(n)c^{(i)}(n). The Collatz sequence of nn for its first kk terms is uniquely determined by nn and p\mathbf{p}, and conversely, nn uniquely determines p\mathbf{p}.

Proof. Given nn, the value c(n)c(n) is deterministic, so the parity of c(n)c(n) is determined. By induction, each c(i)(n)c^{(i)}(n) is determined, hence so is (p0,p1,,pk1)(p_0, p_1, \ldots, p_{k-1}). \square

Theorem 2 (Congruence characterization). For a given parity pattern p{0,1}k\mathbf{p} \in \{0, 1\}^k, the set of starting values nn that realize pattern p\mathbf{p} forms a (possibly empty) residue class modulo some integer M(p)M(\mathbf{p}). Specifically, M(p)=2a3bM(\mathbf{p}) = 2^a \cdot 3^b where aa counts the even-parity steps and bb counts the positions where certain divisibility conditions are imposed.

Proof. We proceed by induction on kk. For k=1k = 1: the pattern (0)(0) corresponds to even nn (i.e., n0(mod2)n \equiv 0 \pmod{2}) and pattern (1)(1) corresponds to odd nn (n1(mod2)n \equiv 1 \pmod{2}). Both are residue classes modulo 22.

For the inductive step, suppose the set of nn realizing pattern (p0,,pk1)(p_0, \ldots, p_{k-1}) is the residue class {n:nr(modM)}\{n : n \equiv r \pmod{M}\}. The (k+1)(k+1)-th parity pkp_k adds one further congruence condition on c(k)(n)c^{(k)}(n), which is a rational linear function of nn with denominator dividing 2a3b2^{a'} \cdot 3^{b'}. Thus the combined conditions form a congruence modulo a (possibly larger) integer of the form 2a3b2^{a''} \cdot 3^{b''}. \square

Lemma 1 (Not all patterns are realizable). Not all 2k2^k binary strings of length kk correspond to realizable Collatz parity patterns. The Collatz map imposes constraints: if c(i)(n)c^{(i)}(n) is obtained by the 3n+13n + 1 rule (odd case), then c(i+1)(n)=3c(i1)(n)/2+2c^{(i+1)}(n) = 3c^{(i-1)}(n)/2 + 2 is necessarily even when c(i)(n)c^{(i)}(n) was odd, but this depends on the specific arithmetic.

Proof. Consider the pattern starting with p0=1p_0 = 1 (odd nn). Then c(n)=3n+1c(n) = 3n + 1, which is always even, so p1=0p_1 = 0 is forced. Thus the pattern (1,1,)(1, 1, \ldots) is not realizable for any k2k \geq 2. More generally, after any odd step, the next value must be even, constraining the successor parity to 00. \square

Theorem 3 (Counting families). The number of families of length kk is:

F(k)={p{0,1}k:n[1,N] with parity pattern p}.F(k) = \bigl|\{\mathbf{p} \in \{0, 1\}^k : \exists\, n \in [1, N] \text{ with parity pattern } \mathbf{p}\}\bigr|.

This is computed by iterating over all n[1,N]n \in [1, N], recording parity patterns, and counting distinct ones.

Proof. By definition, each family corresponds to a distinct realized parity pattern. Two starting values are in the same family if and only if they share the same parity pattern for kk steps. Thus F(k)F(k) equals the number of distinct realized patterns. \square

Editorial

Optimized:* Compute all KK patterns for each nn in a single pass. We else. We enumerate the admissible parameter range, discard candidates that violate the derived bounds or arithmetic constraints, and update the final set or total whenever a candidate passes the acceptance test.

Pseudocode

if val is even
else
if val is even
else

Complexity Analysis

  • Time: O(NK)O(N \cdot K) for the optimized single-pass approach, where NN is the range of starting values and KK is the maximum pattern length.
  • Space: O(2K)O(2^K) worst case for storing the set of distinct bitmasks at each level; in practice, the number of realizable patterns is much smaller.

Answer

2880067194446832666\boxed{2880067194446832666}

Code

Each problem page includes the exact C++ and Python source files from the local archive.

C++ project_euler/problem_494/solution.cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    // Collatz prefix families
    // Count distinct E/O patterns of length k among starting values [1, N]

    const int N = 10000;
    const int K = 30;

    // For each k, store the set of distinct patterns
    // Pattern encoded as a bitmask: bit i = parity at step i
    vector<unordered_set<long long>> pattern_sets(K);

    for (int n = 1; n <= N; n++) {
        long long val = n;
        long long mask = 0;
        for (int step = 0; step < K; step++) {
            if (val % 2 == 1) {
                mask |= (1LL << step);
            }
            pattern_sets[step].insert(mask);
            if (val % 2 == 0)
                val = val / 2;
            else
                val = 3 * val + 1;
        }
    }

    long long total = 0;
    for (int k = 0; k < K; k++) {
        long long fk = pattern_sets[k].size();
        total += fk;
        // cout << "F(" << k + 1 << ") = " << fk << endl;
    }

    cout << total << endl;
    return 0;
}