Project Euler

Investigating Ulam Sequences

For 2 <= n <= 10, find U(2, 2n+1)(10^11) -- the (10^11) -th term of the Ulam sequence U(2, 2n+1) -- and compute the sum of these values. An Ulam sequence U(a, b) with a < b starts with a, b and the...

Source sync Apr 19, 2026

Problem #0167

Level Level 10

Solved By 2,041

Languages C++, Python

Answer 3916160068885

Length 531 words

modular_arithmeticsequencelinear_algebra

For two positive integers $a$ and $b$, the Ulam sequence $U(a,b)$ is defined by $U(a,b)_1 = a$, $U(a,b)_2 = b$ and for $k > 2$, $U(a,b)_k$ is the smallest integer greater than $U(a,b)_{k - 1}$ which can be written in exactly one way as the sum of two distinct previous members of $U(a,b)$.

For example, the sequence $U(1,2)$ begins with

$1$, $2$, $3 = 1 + 2$, $4 = 1 + 3$, $6 = 2 + 4$, $8 = 2 + 6$, $11 = 3 + 8$;

$5$ does not belong to it because $5 = 1 + 4 = 2 + 3$ has two representations as the sum of two previous members, likewise $7 = 1 + 6 = 3 + 4$.

Find $\displaystyle \sum \limits _{n = 2}^{10} U(2,2n+1)_k$, where $k = 10^{11}$.

Problem 167: Investigating Ulam Sequences

Mathematical Foundation

Definition. The Ulam sequence $U(a, b) = (u_1, u_2, u_3, \ldots)$ is defined by $u_1 = a$ , $u_2 = b$ , and for $k \geq 3$ , $u_k$ is the smallest integer $m > u_{k-1}$ such that $|\{(i, j) : i < j, \, u_i + u_j = m\}| = 1$ .

Theorem 1 (Even members of $U(2, v)$ for odd $v \geq 5$ ). The sequence $U(2, v)$ for odd $v \geq 5$ contains exactly two even members: $2$ and $v + 1$ . All subsequent members are odd.

Proof. The first member is $u_1 = 2$ (even), and $u_2 = v$ (odd). The third term is $u_3 = v + 2$ (odd, the unique representation is $2 + v$ ). We claim $e = v + 1$ is the only other even term that eventually appears (as the sum of two odd terms already in the sequence — but we need exactly one representation).

For any odd candidate $c > \max\{u_1, u_2, \ldots\}$ : the representations $c = u_i + u_j$ with $i < j$ can only involve at most one even term (since odd + even = odd, even + even = even $\neq c$ ). The even terms are 2 and $e$ . So the possible representations involving even terms are $(2, c-2)$ and $(e, c-e)$ . Two odd terms sum to an even number, so they cannot produce odd $c$ . Therefore, the number of representations of odd $c$ is:

r(c) = [c - 2 \in U] + [c - e \in U]

where $[\cdot]$ is the Iverson bracket. This means $c$ is Ulam iff exactly one of $c-2$ or $c-e$ is in the sequence — an XOR condition. $\square$

Theorem 2 (Eventual periodicity of differences). For $U(2, 2n+1)$ with $2 \leq n \leq 10$ , the difference sequence $d_k = u_{k+1} - u_k$ is eventually periodic with period $P$ and period sum $D = \sum_{i=1}^{P} d_{T+i}$ for some transient length $T$ .

Proof. By Theorem 1, after the transient, membership in $U$ is determined by the XOR rule: $c \in U \iff [c-2 \in U] \oplus [c-e \in U]$ . This is a linear recurrence over $\mathbb{F}_2$ on the characteristic function $\mathbf{1}_U$ restricted to odd integers. The state is determined by the membership values at positions $c-2$ and $c-e$ , which depends on a window of width $e - 2 = v - 1$ . Since there are $2^{v-1}$ possible windows and the recurrence is deterministic, the sequence of membership bits is eventually periodic. The period of the membership bits directly implies eventual periodicity of the difference sequence. $\square$

Lemma 1 (Known periods). The periods $P$ for $U(2, 2n+1)$ are:

$n$	$v = 2n+1$	Period $P$
2	5	32
3	7	26
4	9	444
5	11	1,628
6	13	5,906
7	15	80
8	17	126,960
9	19	380,882
10	21	2,097,152

These are verified computationally by generating sufficient terms and checking repetition of the difference sequence.

Theorem 3 (Extrapolation formula). Once the transient $T$ , period $P$ , and period sum $D$ are identified, the $k$ -th term for $k > T$ is:

u_k = u_{T + r} + q \cdot D

where $k - T = qP + r$ with $0 \leq r < P$ .

Proof. By periodicity of differences: $u_{T + qP + r} = u_{T+r} + q \cdot \sum_{i=1}^{P} d_{T+i} = u_{T+r} + qD$ . $\square$

Editorial

For U(2, 2n+1) with n=2..10, find the 10^11-th term and sum them. We generate terms using XOR rule until periodicity detected. We then generate using XOR: for odd c, c in U iff exactly one of. Finally, (c-2 in U), (c-e in U) is true.

Pseudocode

Generate terms using XOR rule until periodicity detected
Generate using XOR: for odd c, c in U iff exactly one of
(c-2 in U), (c-e in U) is true
Also handle even member e
Detect period in difference sequence
Extrapolate to k = 10^11

Complexity Analysis

Time: $O\bigl(\sum_{n=2}^{10} (T_n + 3P_n)\bigr)$ for generation and period verification. The largest period is $P_{10} = 2{,}097{,}152$ for $U(2, 21)$ , so total generation is $\sim 6 \times 10^6$ terms. Each term requires $O(1)$ via the XOR rule.
Space: $O(\max_n (T_n + 2P_n))$ for the membership bit array and difference sequence.

Answer

\boxed{3916160068885}

C++ project_euler/problem_167/solution.cpp

#include <bits/stdc++.h>
using namespace std;

/*
 * Problem 167: Investigating Ulam Sequences
 *
 * For U(2, 2n+1) with n=2..10, find the 10^11-th term and sum them.
 * Answer: 3916160068885
 *
 * Key insight: U(2,v) for odd v>=5 has exactly 2 even members: 2 and some e.
 * After that, all members are odd. Odd c is Ulam iff is_ulam[c-2] XOR is_ulam[c-e].
 * This gives O(1) generation per term.
 *
 * The difference sequence is eventually periodic. For small periods we detect
 * them via non-2 diffs. For large periods (U(2,21) with period 2^21), we use
 * the known period from OEIS A100729 and verify it.
 */

const int MAXVAL = 60000000;

// Known periods from OEIS A100729 for U(2, 2n+1), n=2..10
int known_periods[] = {0, 0, 32, 26, 444, 1628, 5906, 80, 126960, 380882, 2097152};

long long solve_ulam(int a, int b, long long k, int nn) {
    // Phase 1: Find second even member
    int SMALL = 100000;
    vector<short> rc(SMALL + 1, 0);
    vector<int> isq = {a, b};
    vector<bool> isu(SMALL + 1, false);
    isu[a] = isu[b] = true;
    if (a + b <= SMALL) rc[a + b]++;

    int second_even = -1;
    int nc2 = b + 1;
    while (nc2 <= SMALL) {
        while (nc2 <= SMALL && rc[nc2] != 1) nc2++;
        if (nc2 > SMALL) break;
        int u = nc2;
        isq.push_back(u);
        isu[u] = true;
        if (u % 2 == 0 && u != a) second_even = u;
        for (int i = 0; i < (int)isq.size() - 1; i++) {
            int s = isq[i] + u;
            if (s <= SMALL && rc[s] < 3) rc[s]++;
        }
        nc2 = u + 1;
        if (second_even > 0) break;
    }

    fprintf(stderr, "U(%d,%d): second_even=%d\n", a, b, second_even);

    // Phase 2: Fast generation using XOR rule
    int kp = known_periods[nn];
    int needed_terms = 6 * kp + 10000; // enough to detect and verify
    if (needed_terms > 14000000) needed_terms = 14000000;
    if (needed_terms < 100000) needed_terms = 100000;

    vector<bool> is_ulam(MAXVAL + 2, false);
    vector<int> seq;
    seq.reserve(needed_terms + 100);
    for (int x : isq) {
        is_ulam[x] = true;
        seq.push_back(x);
    }

    int c = isq.back();
    if (c % 2 == 0) c++;
    else c += 2;

    while (c <= MAXVAL && (int)seq.size() < needed_terms) {
        bool w1 = (c - 2 >= 0 && c - 2 <= MAXVAL && is_ulam[c - 2]);
        bool w2 = (c - second_even >= 0 && c - second_even <= MAXVAL && is_ulam[c - second_even]);
        if (w1 != w2) {
            seq.push_back(c);
            is_ulam[c] = true;
        }
        c += 2;
    }

    int n = seq.size();
    fprintf(stderr, "  generated %d terms, last=%d\n", n, seq[n-1]);

    // Compute diffs
    int nd = n - 1;
    vector<int> diffs(nd);
    for (int i = 0; i < nd; i++) diffs[i] = seq[i+1] - seq[i];

    // Try to detect period using non-2 diffs first (fast for small periods)
    int full_period = -1;

    if (kp <= 500000) {
        // Use non-2 diff approach
        vector<int> non2_val, non2_pos;
        for (int i = 0; i < nd; i++) {
            if (diffs[i] != 2) {
                non2_val.push_back(diffs[i]);
                non2_pos.push_back(i);
            }
        }
        int nn2 = non2_val.size();

        for (int p = 1; p <= nn2 / 6; p++) {
            int b2 = nn2 - p;
            bool ok = true;
            for (int rep = 1; rep <= 4 && ok; rep++) {
                if (b2 - rep * p < 0) { ok = false; break; }
                for (int j = 0; j < p && ok; j++)
                    if (non2_val[b2 + j] != non2_val[b2 - rep * p + j]) ok = false;
            }
            if (ok) {
                int prev_b = b2 - p;
                int fp = non2_pos[b2] - non2_pos[prev_b];

                // Verify
                int ei2 = nd - 1, fb2 = ei2 - fp + 1;
                if (fb2 - 2 * fp >= 0) {
                    bool v = true;
                    for (int rep = 1; rep <= 2 && v; rep++)
                        for (int j = 0; j < fp && v; j++)
                            if (diffs[fb2+j] != diffs[fb2-rep*fp+j]) v = false;
                    if (v) { full_period = fp; break; }
                }
            }
        }
    }

    // If not found, try the known period directly
    if (full_period < 0) {
        int fp = kp;
        int ei2 = nd - 1, fb2 = ei2 - fp + 1;
        if (fb2 - 4 * fp >= 0) {
            bool v = true;
            for (int rep = 1; rep <= 4 && v; rep++)
                for (int j = 0; j < fp && v; j++)
                    if (diffs[fb2+j] != diffs[fb2-rep*fp+j]) v = false;
            if (v) full_period = fp;
        }
    }

    if (full_period < 0) {
        fprintf(stderr, "  FAILED to find/verify period\n");
        return -1;
    }

    // Find start of periodicity
    int ei = nd - 1, fb = ei - full_period + 1;
    int pstart = fb;
    while (pstart >= full_period) {
        bool m = true;
        for (int j = 0; j < full_period && m; j++)
            if (diffs[pstart-full_period+j] != diffs[pstart+j]) m = false;
        if (m) pstart -= full_period;
        else break;
    }

    long long psum = 0;
    for (int j = 0; j < full_period; j++) psum += diffs[pstart + j];

    // Verify extrapolation
    {
        long long tidx = n - 1;
        long long off = tidx - pstart;
        long long q2 = off / full_period;
        long long r = off % full_period;
        long long pred = (long long)seq[pstart+(int)r] + q2 * psum;
        if (pred != seq[n-1]) {
            fprintf(stderr, "  extrapolation FAILED\n");
            return -1;
        }
    }

    fprintf(stderr, "  period=%d, start=%d, psum=%lld\n", full_period, pstart, psum);

    long long idx = k - 1;
    if (idx < n) return seq[idx];
    long long offset = idx - pstart;
    long long q2 = offset / full_period;
    long long r = offset % full_period;
    return (long long)seq[pstart+(int)r] + q2 * psum;
}

int main(){
    long long K = 100000000000LL;
    long long total = 0;
    for (int nn = 2; nn <= 10; nn++) {
        int v = 2 * nn + 1;
        long long val = solve_ulam(2, v, K, nn);
        if (val < 0) {
            fprintf(stderr, "Failed for U(2,%d)\n", v);
            return 1;
        }
        fprintf(stderr, "  U(2,%d)(10^11) = %lld\n\n", v, val);
        total += val;
    }
    cout << total << endl;
    return 0;
}

Python project_euler/problem_167/solution.py

"""
Problem 167: Investigating Ulam Sequences

For U(2, 2n+1) with n=2..10, find the 10^11-th term and sum them.
Answer: 3916160068885

Key insight: U(2,v) for odd v>=5 has exactly 2 even members: 2 and some e.
After that, all members are odd. Odd c is Ulam iff is_ulam[c-2] XOR is_ulam[c-e].
This gives O(1) generation per term.

The difference sequence is eventually periodic with known periods from OEIS A100729.
"""

import sys

# Known periods from OEIS A100729 for U(2, 2n+1), n=2..10
KNOWN_PERIODS = {
    2: 32, 3: 26, 4: 444, 5: 1628, 6: 5906,
    7: 80, 8: 126960, 9: 380882, 10: 2097152
}

def solve_ulam(a, b, k, nn):
    """Find the k-th term of U(a, b)."""
    # Phase 1: Find second even member via brute force
    SMALL = 100000
    rc = bytearray(SMALL + 1)
    sq = [a, b]
    isu = bytearray(SMALL + 1)
    isu[a] = isu[b] = 1
    if a + b <= SMALL:
        rc[a + b] = 1

    second_even = -1
    nc = b + 1
    while nc <= SMALL:
        while nc <= SMALL and rc[nc] != 1:
            nc += 1
        if nc > SMALL:
            break
        u = nc
        sq.append(u)
        isu[u] = 1
        if u % 2 == 0 and u != a:
            second_even = u
        for i in range(len(sq) - 1):
            s = sq[i] + u
            if s <= SMALL and rc[s] < 3:
                rc[s] += 1
        nc = u + 1
        if second_even > 0:
            break

    print(f"U({a},{b}): second_even={second_even}", file=sys.stderr)

    # Phase 2: Fast generation using XOR rule
    kp = KNOWN_PERIODS[nn]
    needed = min(6 * kp + 10000, 14000000)
    needed = max(needed, 100000)

    MAXVAL = 60000000
    is_ulam = bytearray(MAXVAL + 2)
    seq = []
    for x in sq:
        is_ulam[x] = 1
        seq.append(x)

    c = sq[-1]
    c = c + 1 if c % 2 == 0 else c + 2

    while c <= MAXVAL and len(seq) < needed:
        w1 = (c - 2 >= 0 and c - 2 <= MAXVAL and is_ulam[c - 2])
        w2 = (c - second_even >= 0 and c - second_even <= MAXVAL and is_ulam[c - second_even])
        if w1 != w2:
            seq.append(c)
            is_ulam[c] = 1
        c += 2

    n = len(seq)
    print(f"  generated {n} terms", file=sys.stderr)

    # Compute diffs
    diffs = [seq[i+1] - seq[i] for i in range(n-1)]
    nd = len(diffs)

    # Try known period
    fp = kp
    ei = nd - 1
    fb = ei - fp + 1
    if fb - 4 * fp >= 0:
        ok = True
        for rep in range(1, 5):
            for j in range(fp):
                if diffs[fb + j] != diffs[fb - rep * fp + j]:
                    ok = False
                    break
            if not ok:
                break
        if not ok:
            print(f"  period {fp} verification FAILED", file=sys.stderr)
            return -1
    else:
        print(f"  not enough terms for period {fp}", file=sys.stderr)
        return -1

    # Find start of periodicity
    pstart = fb
    while pstart >= fp:
        match = all(diffs[pstart - fp + j] == diffs[pstart + j] for j in range(fp))
        if match:
            pstart -= fp
        else:
            break

    psum = sum(diffs[pstart:pstart + fp])

    # Verify
    tidx = n - 1
    off = tidx - pstart
    q, r = divmod(off, fp)
    pred = seq[pstart + r] + q * psum
    if pred != seq[-1]:
        print(f"  extrapolation FAILED", file=sys.stderr)
        return -1

    print(f"  period={fp}, psum={psum}", file=sys.stderr)

    idx = k - 1
    if idx < n:
        return seq[idx]
    offset = idx - pstart
    q, r = divmod(offset, fp)
    return seq[pstart + r] + q * psum

def solve():
    K = 10**11
    total = 0
    for nn in range(2, 11):
        v = 2 * nn + 1
        val = solve_ulam(2, v, K, nn)
        if val < 0:
            print(f"Failed for U(2,{v})", file=sys.stderr)
            return
        total += val
    print(total)

solve()