All Euler problems
Project Euler

Smallest Prime Factor

Let smpf(n) denote the smallest prime factor of n for n >= 2. Define S(N) = sum_(n=2)^N smpf(n). Compute S(10^12) mod 10^9.

Source sync Apr 19, 2026
Problem #0521
Level Level 16
Solved By 907
Languages C++, Python
Answer 44389811
Length 498 words
modular_arithmeticnumber_theorydynamic_programming

Problem Statement

This archive keeps the full statement, math, and original media on the page.

Let \(\operatorname {smpf}(n)\) be the smallest prime factor of \(n\).

We have \(\operatorname {smpf}(91)=7\) because \(91=7\times 13\) and \(\operatorname {smpf}(45)=3\) because \(45=3\times 3\times 5\).

Let \(S(n)\) be the sum of \(\operatorname {smpf}(i)\) for \(2 \le i \le n\).

E.g. \(S(100)=1257\).

Find \(S(10^{12}) \bmod 10^9\).

Problem 521: Smallest Prime Factor

Mathematical Foundation

Definition. For an integer n2n \ge 2 with canonical factorisation n=p1a1prarn = p_1^{a_1} \cdots p_r^{a_r} where p1<<prp_1 < \cdots < p_r, define smpf(n)=p1\mathrm{smpf}(n) = p_1.

Theorem 1 (Partition by Smallest Prime Factor). For any integer N2N \ge 2,

S(N)=p primepNp{n[2,N]:smpf(n)=p}.S(N) = \sum_{\substack{p \text{ prime} \\ p \le N}} p \cdot \bigl|\{n \in [2, N] : \mathrm{smpf}(n) = p\}\bigr|.

Proof. By the Fundamental Theorem of Arithmetic, every integer n2n \ge 2 possesses a unique smallest prime factor. The map nsmpf(n)n \mapsto \mathrm{smpf}(n) therefore induces a partition of {2,3,,N}\{2, 3, \ldots, N\} into disjoint subsets Cp={n[2,N]:smpf(n)=p}\mathcal{C}_p = \{n \in [2, N] : \mathrm{smpf}(n) = p\} indexed by primes pNp \le N. Since these sets are pairwise disjoint and their union is {2,,N}\{2, \ldots, N\}, we may interchange the order of summation:

S(N)=n=2Nsmpf(n)=p primepNnCpp=p primepNpCp.S(N) = \sum_{n=2}^{N} \mathrm{smpf}(n) = \sum_{\substack{p \text{ prime} \\ p \le N}} \sum_{n \in \mathcal{C}_p} p = \sum_{\substack{p \text{ prime} \\ p \le N}} p \cdot |\mathcal{C}_p|. \qquad \square

Theorem 2 (Lucy DP for Prime Summation). Let V={N/i:1iN}\mathcal{V} = \{\lfloor N/i \rfloor : 1 \le i \le N\} denote the set of distinct floor quotients of NN. For each vVv \in \mathcal{V}, define

S0(0)(v)=v1,S1(0)(v)=v(v+1)21,S_0^{(0)}(v) = v - 1, \qquad S_1^{(0)}(v) = \frac{v(v+1)}{2} - 1,

representing, respectively, the count and sum of integers in [2,v][2, v] (treating all as tentative primes). For each prime pp with p2Np^2 \le N, processed in increasing order, define the updates for all vp2v \ge p^2:

S0(p)(v)=S0(p)(v)(S0(p)(v/p)S0(p)(p1)),S_0^{(p)}(v) = S_0^{(p-)}(v) - \bigl(S_0^{(p-)}(\lfloor v/p \rfloor) - S_0^{(p-)}(p-1)\bigr), S1(p)(v)=S1(p)(v)p(S0(p)(v/p)S0(p)(p1)),S_1^{(p)}(v) = S_1^{(p-)}(v) - p \cdot \bigl(S_0^{(p-)}(\lfloor v/p \rfloor) - S_0^{(p-)}(p-1)\bigr),

where (p)(p-) denotes the state after processing all primes less than pp. After processing all primes pNp \le \lfloor\sqrt{N}\rfloor, the terminal values satisfy S0(v)=π(v)S_0(v) = \pi(v) and S1(v)=pvpS_1(v) = \sum_{p \le v} p.

Proof. We proceed by strong induction on the primes processed.

Base. Before any sieve step, S0(0)(v)=v1={2,,v}S_0^{(0)}(v) = v - 1 = |\{2, \ldots, v\}| and S1(0)(v)=k=2vkS_1^{(0)}(v) = \sum_{k=2}^{v} k. These correctly count and sum all integers in [2,v][2, v] under the hypothesis that none have yet been identified as composite.

Inductive step. Suppose that after processing all primes q<pq < p, the value S0(p)(v)S_0^{(p-)}(v) counts integers in [2,v][2, v] that are either prime or have smallest prime factor p\ge p, and S1(p)(v)S_1^{(p-)}(v) sums them. The composites in [2,v][2, v] with smpf=p\mathrm{smpf} = p are exactly the integers pmpm where m[2,v/p]m \in [2, \lfloor v/p \rfloor] and smpf(m)p\mathrm{smpf}(m) \ge p. The count of such mm is S0(p)(v/p)S0(p)(p1)S_0^{(p-)}(\lfloor v/p \rfloor) - S_0^{(p-)}(p-1), where S0(p)(p1)=π(p1)S_0^{(p-)}(p-1) = \pi(p-1) counts primes less than pp (which have smpf<p\mathrm{smpf} < p and must be excluded). Subtracting this count (respectively, pp times this count) from S0S_0 (respectively, S1S_1) removes the contribution of composites with smpf=p\mathrm{smpf} = p. Each composite is removed exactly once, at the stage corresponding to its smallest prime factor.

After all primes pNp \le \sqrt{N} are processed, every composite nvn \le v has been removed (since every composite nn has smpf(n)nvN\mathrm{smpf}(n) \le \sqrt{n} \le \sqrt{v} \le \sqrt{N}). Hence S0(v)=π(v)S_0(v) = \pi(v) and S1(v)=pvpS_1(v) = \sum_{p \le v} p. \square

Theorem 3 (Recovering S(N)S(N) from Lucy DP). The full smallest-prime-factor sum decomposes as

S(N)=S1(N)+p primep2NpC(N,p)S(N) = S_1(N) + \sum_{\substack{p \text{ prime} \\ p^2 \le N}} p \cdot C(N, p)

where C(N,p)C(N, p) counts composites n[2,N]n \in [2, N] with smpf(n)=p\mathrm{smpf}(n) = p. This quantity is extracted recursively from the sieved arrays: C(N,p)C(N, p) equals the number of integers in [p,N/p][p, \lfloor N/p \rfloor] whose smallest prime factor is p\ge p, which is S0(N/p)S0(p1)S_0(\lfloor N/p \rfloor) - S_0(p-1) after the Lucy DP, minus corrections for primes in [p,N/p][p, \lfloor N/p \rfloor] that are counted separately.

Proof. Each integer n[2,N]n \in [2, N] is either prime or composite. If nn is prime, smpf(n)=n\mathrm{smpf}(n) = n, contributing nn to S(N)S(N); the sum of all such contributions is S1(N)S_1(N). If nn is composite with smpf(n)=p\mathrm{smpf}(n) = p, then p2nNp^2 \le n \le N (since nn has at least two prime factors, both p\ge p), so pNp \le \sqrt{N}. Writing n=pmn = pm with smpf(m)p\mathrm{smpf}(m) \ge p and mpm \ge p, the contribution is pp, and summing over all such nn gives pC(N,p)p \cdot C(N, p). The count C(N,p)C(N, p) is computed from the post-sieve arrays by noting that mm ranges over integers in [p,N/p][p, \lfloor N/p \rfloor] with smpf(m)p\mathrm{smpf}(m) \ge p. This recursive structure terminates because N/p<N\lfloor N/p \rfloor < N, reducing the problem at each level. \square

Editorial

Let smpf(n) = smallest prime factor of n. Compute S(N) = sum_{n=2}^{N} smpf(n) mod 10^9. Uses the Lucy DP (prime-counting sieve) combined with recursive enumeration of composite contributions by smallest prime factor. We collect distinct floor quotients. We then initialise Lucy DP arrays. Finally, iterate over v in V.

Pseudocode

Collect distinct floor quotients
Initialise Lucy DP arrays
for v in V
Sieve phase
Accumulate smpf sum via recursive enumeration
of composites by their smallest prime factor

Complexity Analysis

  • Time: O(N3/4/lnN)O(N^{3/4} / \ln N). The sieve phase iterates over O(N)O(\sqrt{N}) floor quotient values for each prime pNp \le \sqrt{N}. By the constraint vp2v \ge p^2 and the prime-counting estimate π(N)=O(N/lnN)\pi(\sqrt{N}) = O(\sqrt{N}/\ln N), the total number of updates is pN{vV:vp2}\sum_{p \le \sqrt{N}} |\{v \in \mathcal{V} : v \ge p^2\}|, which sums to O(N3/4/lnN)O(N^{3/4}/\ln N).

  • Space: O(N)O(\sqrt{N}) for the two arrays indexed by the O(N)O(\sqrt{N}) distinct floor quotient values.

Answer

44389811\boxed{44389811}

Code

Each problem page includes the exact C++ and Python source files from the local archive.

C++ project_euler/problem_521/solution.cpp
#include <bits/stdc++.h>
using namespace std;

/*
 * Problem 521: Smallest Prime Factor
 *
 * S(N) = sum_{n=2}^{N} smpf(n) mod 10^9.
 *
 * Uses Lucy DP (prime-counting sieve) with composite-contribution
 * accumulation during the sieve phase.
 */

typedef long long ll;
const ll MOD = 1000000000LL;

int main() {
    ll N = 1000000000000LL; // 10^12

    ll sqrtN = (ll)sqrt((double)N);
    while ((sqrtN + 1) * (sqrtN + 1) <= N) sqrtN++;
    while (sqrtN * sqrtN > N) sqrtN--;

    // Collect distinct floor(N/i) values
    vector<ll> vals;
    for (ll i = 1; i <= N;) {
        ll v = N / i;
        vals.push_back(v);
        i = N / v + 1;
    }
    sort(vals.begin(), vals.end());
    vals.erase(unique(vals.begin(), vals.end()), vals.end());
    int sz = (int)vals.size();

    // Index mapping
    auto idx = [&](ll v) -> int {
        return (int)(lower_bound(vals.begin(), vals.end(), v) - vals.begin());
    };

    // S0[i] = count, S1[i] = sum (mod MOD)
    vector<ll> S0(sz), S1(sz);
    for (int i = 0; i < sz; i++) {
        ll v = vals[i];
        S0[i] = v - 1;
        ll vm = v % MOD;
        S1[i] = (vm * ((vm + 1) % MOD) % MOD * 500000000LL % MOD - 1 + MOD) % MOD;
    }

    // Lucy DP sieve + composite contribution accumulation
    ll result = 0;

    // We need a copy of S0 for the composite counting pass
    vector<ll> S0c = S0;

    // Sieve S0 and S1 fully to get prime count and prime sum
    for (ll p = 2; p <= sqrtN; p++) {
        int pi = idx(p);
        int pi1 = idx(p - 1);
        if (S0[pi] == S0[pi1]) continue; // not prime

        ll pc = S0[pi1];
        ll ps = S1[pi1];
        ll p2 = p * p;

        for (int i = sz - 1; i >= 0 && vals[i] >= p2; i--) {
            ll v = vals[i];
            int vi = idx(v / p);
            ll cnt = S0[vi] - pc;
            S0[i] -= cnt;
            S1[i] = ((S1[i] - (p % MOD) * ((S1[vi] - ps + MOD) % MOD) % MOD) % MOD + MOD) % MOD;
        }
    }

    // S1[idx(N)] = sum of primes <= N (mod MOD)
    result = S1[idx(N)];

    // Now compute composite contributions using S0c
    // Process primes in order; before sieving prime p from S0c,
    // accumulate composites with smpf = p
    for (ll p = 2; p <= sqrtN; p++) {
        int pi = idx(p);
        int pi1 = idx(p - 1);
        if (S0c[pi] == S0c[pi1]) continue;

        ll pc = S0c[pi1]; // pi(p-1)

        // Count of composites in [2,N] with smpf = p:
        // = |{m in [2, N/p] : smpf(m) >= p}|
        // = S0c[N/p] - pc
        ll vp = N / p;
        if (vp >= 2) {
            ll cnt = S0c[idx(vp)] - pc;
            result = (result + (p % MOD) * (cnt % MOD) % MOD) % MOD;
        }

        // Now sieve p out of S0c
        ll p2 = p * p;
        for (int i = sz - 1; i >= 0 && vals[i] >= p2; i--) {
            ll v = vals[i];
            int vi = idx(v / p);
            S0c[i] -= (S0c[vi] - pc);
        }
    }

    cout << (result % MOD + MOD) % MOD << endl;

    return 0;
}