Smallest Prime Factor
Let smpf(n) denote the smallest prime factor of n for n >= 2. Define S(N) = sum_(n=2)^N smpf(n). Compute S(10^12) mod 10^9.
Problem Statement
This archive keeps the full statement, math, and original media on the page.
Let \(\operatorname {smpf}(n)\) be the smallest prime factor of \(n\).
We have \(\operatorname {smpf}(91)=7\) because \(91=7\times 13\) and \(\operatorname {smpf}(45)=3\) because \(45=3\times 3\times 5\).
Let \(S(n)\) be the sum of \(\operatorname {smpf}(i)\) for \(2 \le i \le n\).
E.g. \(S(100)=1257\).
Find \(S(10^{12}) \bmod 10^9\).
Problem 521: Smallest Prime Factor
Mathematical Foundation
Definition. For an integer with canonical factorisation where , define .
Theorem 1 (Partition by Smallest Prime Factor). For any integer ,
Proof. By the Fundamental Theorem of Arithmetic, every integer possesses a unique smallest prime factor. The map therefore induces a partition of into disjoint subsets indexed by primes . Since these sets are pairwise disjoint and their union is , we may interchange the order of summation:
Theorem 2 (Lucy DP for Prime Summation). Let denote the set of distinct floor quotients of . For each , define
representing, respectively, the count and sum of integers in (treating all as tentative primes). For each prime with , processed in increasing order, define the updates for all :
where denotes the state after processing all primes less than . After processing all primes , the terminal values satisfy and .
Proof. We proceed by strong induction on the primes processed.
Base. Before any sieve step, and . These correctly count and sum all integers in under the hypothesis that none have yet been identified as composite.
Inductive step. Suppose that after processing all primes , the value counts integers in that are either prime or have smallest prime factor , and sums them. The composites in with are exactly the integers where and . The count of such is , where counts primes less than (which have and must be excluded). Subtracting this count (respectively, times this count) from (respectively, ) removes the contribution of composites with . Each composite is removed exactly once, at the stage corresponding to its smallest prime factor.
After all primes are processed, every composite has been removed (since every composite has ). Hence and .
Theorem 3 (Recovering from Lucy DP). The full smallest-prime-factor sum decomposes as
where counts composites with . This quantity is extracted recursively from the sieved arrays: equals the number of integers in whose smallest prime factor is , which is after the Lucy DP, minus corrections for primes in that are counted separately.
Proof. Each integer is either prime or composite. If is prime, , contributing to ; the sum of all such contributions is . If is composite with , then (since has at least two prime factors, both ), so . Writing with and , the contribution is , and summing over all such gives . The count is computed from the post-sieve arrays by noting that ranges over integers in with . This recursive structure terminates because , reducing the problem at each level.
Editorial
Let smpf(n) = smallest prime factor of n. Compute S(N) = sum_{n=2}^{N} smpf(n) mod 10^9. Uses the Lucy DP (prime-counting sieve) combined with recursive enumeration of composite contributions by smallest prime factor. We collect distinct floor quotients. We then initialise Lucy DP arrays. Finally, iterate over v in V.
Pseudocode
Collect distinct floor quotients
Initialise Lucy DP arrays
for v in V
Sieve phase
Accumulate smpf sum via recursive enumeration
of composites by their smallest prime factor
Complexity Analysis
-
Time: . The sieve phase iterates over floor quotient values for each prime . By the constraint and the prime-counting estimate , the total number of updates is , which sums to .
-
Space: for the two arrays indexed by the distinct floor quotient values.
Answer
Code
Each problem page includes the exact C++ and Python source files from the local archive.
#include <bits/stdc++.h>
using namespace std;
/*
* Problem 521: Smallest Prime Factor
*
* S(N) = sum_{n=2}^{N} smpf(n) mod 10^9.
*
* Uses Lucy DP (prime-counting sieve) with composite-contribution
* accumulation during the sieve phase.
*/
typedef long long ll;
const ll MOD = 1000000000LL;
int main() {
ll N = 1000000000000LL; // 10^12
ll sqrtN = (ll)sqrt((double)N);
while ((sqrtN + 1) * (sqrtN + 1) <= N) sqrtN++;
while (sqrtN * sqrtN > N) sqrtN--;
// Collect distinct floor(N/i) values
vector<ll> vals;
for (ll i = 1; i <= N;) {
ll v = N / i;
vals.push_back(v);
i = N / v + 1;
}
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
int sz = (int)vals.size();
// Index mapping
auto idx = [&](ll v) -> int {
return (int)(lower_bound(vals.begin(), vals.end(), v) - vals.begin());
};
// S0[i] = count, S1[i] = sum (mod MOD)
vector<ll> S0(sz), S1(sz);
for (int i = 0; i < sz; i++) {
ll v = vals[i];
S0[i] = v - 1;
ll vm = v % MOD;
S1[i] = (vm * ((vm + 1) % MOD) % MOD * 500000000LL % MOD - 1 + MOD) % MOD;
}
// Lucy DP sieve + composite contribution accumulation
ll result = 0;
// We need a copy of S0 for the composite counting pass
vector<ll> S0c = S0;
// Sieve S0 and S1 fully to get prime count and prime sum
for (ll p = 2; p <= sqrtN; p++) {
int pi = idx(p);
int pi1 = idx(p - 1);
if (S0[pi] == S0[pi1]) continue; // not prime
ll pc = S0[pi1];
ll ps = S1[pi1];
ll p2 = p * p;
for (int i = sz - 1; i >= 0 && vals[i] >= p2; i--) {
ll v = vals[i];
int vi = idx(v / p);
ll cnt = S0[vi] - pc;
S0[i] -= cnt;
S1[i] = ((S1[i] - (p % MOD) * ((S1[vi] - ps + MOD) % MOD) % MOD) % MOD + MOD) % MOD;
}
}
// S1[idx(N)] = sum of primes <= N (mod MOD)
result = S1[idx(N)];
// Now compute composite contributions using S0c
// Process primes in order; before sieving prime p from S0c,
// accumulate composites with smpf = p
for (ll p = 2; p <= sqrtN; p++) {
int pi = idx(p);
int pi1 = idx(p - 1);
if (S0c[pi] == S0c[pi1]) continue;
ll pc = S0c[pi1]; // pi(p-1)
// Count of composites in [2,N] with smpf = p:
// = |{m in [2, N/p] : smpf(m) >= p}|
// = S0c[N/p] - pc
ll vp = N / p;
if (vp >= 2) {
ll cnt = S0c[idx(vp)] - pc;
result = (result + (p % MOD) * (cnt % MOD) % MOD) % MOD;
}
// Now sieve p out of S0c
ll p2 = p * p;
for (int i = sz - 1; i >= 0 && vals[i] >= p2; i--) {
ll v = vals[i];
int vi = idx(v / p);
S0c[i] -= (S0c[vi] - pc);
}
}
cout << (result % MOD + MOD) % MOD << endl;
return 0;
}
"""
Project Euler Problem 521: Smallest Prime Factor
Let smpf(n) = smallest prime factor of n.
Compute S(N) = sum_{n=2}^{N} smpf(n) mod 10^9.
Uses the Lucy DP (prime-counting sieve) combined with recursive
enumeration of composite contributions by smallest prime factor.
"""
import math
MOD = 10**9
def solve(N):
"""Compute S(N) = sum_{n=2}^{N} smpf(n) mod MOD via Lucy DP."""
if N < 2:
return 0
sqrtN = int(math.isqrt(N))
# Collect distinct floor quotients N//i
vals = []
i = 1
while i <= N:
v = N // i
vals.append(v)
i = N // v + 1
vals = sorted(set(vals))
# Index mapping: small values use direct index, large use N//v
small = [0] * (sqrtN + 2)
large = [0] * (sqrtN + 2)
def idx(v):
return v if v <= sqrtN else len(vals) - (N // v)
# S0[v] = count of "surviving" integers in [2,v]
# S1[v] = sum of "surviving" integers in [2,v]
S0 = {}
S1 = {}
for v in vals:
S0[v] = v - 1
S1[v] = (v * (v + 1) // 2 - 1) % MOD
# Lucy DP sieve
primes_list = []
for p in range(2, sqrtN + 1):
if S0[p] == S0.get(p - 1, 0):
continue
primes_list.append(p)
p_count = S0[p - 1]
p_sum = S1[p - 1]
p2 = p * p
for v in reversed(vals):
if v < p2:
break
vp = v // p
cnt = S0[vp] - p_count
S0[v] -= cnt
S1[v] = (S1[v] - p % MOD * ((S1[vp] - p_sum) % MOD)) % MOD
# Now S1[v] = sum of primes <= v (mod MOD), S0[v] = pi(v).
# Recursive function to compute sum of smpf(n) for n in [2,v]
# with smpf(n) >= primes_list[pidx].
# T(v, pidx) = sum of smpf(n) for n in [2,v] where smpf(n) >= primes_list[pidx]
# = (sum of primes in [primes_list[pidx], v])
# + sum over primes p = primes_list[pidx..] with p^2 <= v of
# p * (T(v/p, pidx_of_p) - (sum of primes >= p in [p, v/p]) + ...)
# We implement this iteratively.
def smpf_sum(v, pi):
"""
Compute sum of smpf(n) for all n in [2,v] with smpf(n) >= primes_list[pi].
"""
# Prime contributions: primes p in [primes_list[pi], v]
if pi >= len(primes_list):
# Only primes > sqrt(N) can appear; their contribution is S1[v] - S1[prev]
return (S1[v] - (S1[primes_list[-1] - 1] if primes_list else 0)) % MOD if v >= 2 else 0
p = primes_list[pi]
if p > v:
return 0
# Sum of primes in [p, v] = S1[v] - S1[p-1]
result = (S1[v] - S1[p - 1]) % MOD
# Composite contributions with smpf = q for each prime q >= p with q^2 <= v
for j in range(pi, len(primes_list)):
q = primes_list[j]
if q * q > v:
break
# Composites with smpf = q: n = q*m where smpf(m) >= q, m in [q, v/q]
# Each contributes smpf(n) = q
# But m can be prime or composite, and we need to count how many such m
# Actually we need: for each such n = q*m, smpf(n) = q, so contribution is q.
# Count of m in [q, v/q] with smpf(m) >= q:
# = (S0[v/q] - pi(q-1)) for m being prime, plus composites with smpf >= q
# We iterate: n = q*m, q*m^2 could give deeper nesting.
# Use: sum of smpf over composites with smpf = q in [2,v]
# = q * (number of m in [q, v//q] with smpf(m) >= q)
# PLUS for composites n = q*m where m itself is composite with smpf >= q,
# we still contribute smpf(n) = q, not smpf(m).
# So: contribution = q * |{m in [q, v//q] : smpf(m) >= q}|
# |{m in [q, v//q] : smpf(m) >= q}| = (S0[v//q] - S0[q-1]) + ...
# Actually we need all m >= q in [2, v//q] with smpf(m) >= q.
# = |{m in [2, v//q] : smpf(m) >= q}| - |{m in [2, q-1] : smpf(m) >= q}|
# Since primes < q have smpf < q, and q-1 < q, there are no integers in [2,q-1] with smpf >= q
# except... actually composites in [2, q-1] all have smpf < q. And primes in [2,q-1] have smpf = themselves < q.
# So |{m in [2, q-1] : smpf(m) >= q}| = 0.
# Thus count = |{m in [2, v//q] : smpf(m) >= q}| = (v//q - 1) - (number removed by primes < q)
# After Lucy DP, the number of integers in [2, v//q] with smpf >= q... hmm.
# Actually the Lucy DP S0 after sieving up to all primes gives pi(v).
# We need intermediate values.
# This approach requires storing intermediate DP states, which we didn't do.
pass
return result % MOD
# Since the full recursive enumeration requires intermediate Lucy DP states
# (which we overwrote during the sieve), we use an alternative approach:
# Re-run the sieve but accumulate smpf contributions during the sieve itself.
# Reset and re-run with smpf accumulation
T = {} # T[v] = sum of smpf(n) for n in [2,v], built incrementally
for v in vals:
T[v] = (v * (v + 1) // 2 - 1) % MOD # initially assume smpf(n) = n
# Re-initialise S0 for counting
S0b = {}
for v in vals:
S0b[v] = v - 1
for p in range(2, sqrtN + 1):
if S0b[p] == S0b.get(p - 1, 0):
continue
p_count = S0b[p - 1]
p2 = p * p
for v in reversed(vals):
if v < p2:
break
vp = v // p
cnt = S0b[vp] - p_count
S0b[v] -= cnt
# These cnt composites previously had their "smpf" assumed to be themselves.
# Actually they are multiples of p whose smpf is p.
# We need to subtract their assumed contribution and add p * cnt.
# The assumed contribution of these composites (their values) was accumulated
# in the initial T[v] = sum of all integers in [2,v].
# But we can't easily extract "the sum of values of composites with smpf=p".
# The standard approach: use a separate DP that directly computes smpf sums.
# Let F(v, p) = sum of smpf(n) for n in [2,v] where smpf(n) > p.
# Then S(N) = F(N, 1) + ... = total smpf sum.
# F(v, p_last) = sum of primes q in (p_last, v], each contributing q (as smpf(q)=q),
# + sum over primes q > p_last with q^2 <= v of
# q * (number of composites with smpf = q in [2,v])
# Actually we combine: S(N) = sum of primes up to N + sum over primes p with p^2 <= N
# of p * (count of m with smpf(m) >= p and p*m <= N, m >= 2).
# For a correct O(N^{3/4}/log N) solution, we use the "min-25 sieve" variant.
# Intermediate prime counts at each sieve level are needed.
# We store them by re-running the sieve and saving snapshots.
# Re-initialise
S0c = {}
for v in vals:
S0c[v] = v - 1
# We'll process primes and at each level, after sieving prime p,
# we compute the composite contribution with smpf = p.
result = 0
# First pass: get prime sum = S1[N] (already computed)
result = S1[N] % MOD
# Second pass: accumulate composite contributions
# Before sieving, S0c[v] = v - 1 (count of [2,v])
for p in range(2, sqrtN + 1):
if S0c[p] == S0c.get(p - 1, 0):
continue
# At this point, S0c[v] = count of integers in [2,v] with smpf >= p
# (those not yet sieved out)
p_count = S0c[p - 1] # = pi(p-1)
# Composites with smpf = p contribute:
# For n = p*m with m in [p, N/p] and smpf(m) >= p:
# count(m) for each power: m can be p, p^2, ..., or any composite with smpf >= p
# Actually, for the first level: number of n in [2, N] with smpf = p
# = (count of m in [2, N//p] with smpf >= p) - (if p itself is in that count, it's fine,
# p*p is a composite with smpf = p, p itself as m gives n = p^2 etc.)
# count of m in [p, N//p] with smpf >= p = S0c[N//p] - S0c[p-1]
# But wait, we also need to handle higher powers: p*m where m has smpf >= p
# and m itself could be p (giving p^2), or m = p*m' (giving p^2*m'), etc.
# For smpf sum, contribution of n with smpf(n) = p is just p per such n.
# So total composite contribution = p * (count of composites with smpf = p in [2,N])
# Count of integers in [2, N] with smpf = p:
# = (count of integers in [2, N//p] with smpf >= p) where we map n -> n/p
# but n must be composite, i.e., n != p, so m = n/p >= 2, and... actually n = p is prime.
# Integers with smpf = p in [2, N]:
# = {p} union {pm : m >= 2, smpf(m) >= p, pm <= N}
# = {p} union {pm : m in [2, N//p], smpf(m) >= p}
# The prime p is already counted in S1[N], so we need composites only.
# Composites with smpf = p: {pm : m in [2, N//p], smpf(m) >= p}
# Count = (S0c[N//p] - p_count) ... but this includes m being primes >= p
# (giving n = p * prime, which is composite with smpf = p, correct)
# and m being 1? No, m >= 2.
# S0c[N//p] = count of integers in [2, N//p] with smpf >= p
# So count of composites in [2, N] with smpf = p
# = S0c[N//p] - p_count -- NO wait.
# S0c[v] before this sieve step = (count of integers in [2,v] with smpf >= p)
# = (v - 1) - (count of composites with smpf < p sieved out so far)
# = (count of primes < p) + (count of integers in [2,v] with smpf >= p)
# Hmm, S0c[v] after sieving primes < p = pi(p-1) + (count of integers with smpf >= p in [2,v])
# Wait no. Let me reconsider.
# Initially S0c[v] = v-1. After sieving by prime q, we remove composites with smpf = q.
# So after sieving all primes < p:
# S0c[v] = (# primes in [2,v]) that are < p, PLUS (# integers in [2,v] with smpf >= p)
# = pi(p-1) + |{n in [2,v] : smpf(n) >= p}| ... wait that double counts.
# Actually: S0c[v] = |{n in [2,v] : n is prime or smpf(n) >= p}|
# = |{primes in [2,v] with value < p}| + |{primes in [2,v] with value >= p}|
# + |{composites in [2,v] with smpf >= p}|
# = pi(v) restricted to... no. Think again.
# At the start, all of [2,v] is in. After sieving prime 2, composites with smpf=2 removed.
# After sieving 3, composites with smpf=3 removed. Etc.
# After sieving all primes < p:
# S0c[v] = |[2,v]| - |{composites in [2,v] with smpf < p}|
# = |{primes in [2,v]}| + |{composites in [2,v] with smpf >= p}|
# = pi(v) + |{composites in [2,v] with smpf >= p}|
# So |{composites in [2,v] with smpf >= p}| = S0c[v] - pi(v)
# And pi(p-1) = S0c[p-1] = p_count (since for v = p-1, there are no composites
# in [2, p-1] with smpf >= p, so S0c[p-1] = pi(p-1)).
# Composites with smpf = p in [2, N]:
# = |{m in [2, N//p] : smpf(m) >= p}| (each gives composite n = pm with smpf = p)
# = S0c[N//p] - pi(p-1) ... no.
# |{m in [2, N//p] : smpf(m) >= p}| = S0c[N//p] - pi(p-1)
# only if all primes < p have been sieved and S0c properly reflects that.
# Wait: S0c[N//p] = pi(N//p) + |{composites in [2, N//p] with smpf >= p}|
# |{integers in [2, N//p] with smpf >= p}| includes primes >= p and composites with smpf >= p
# = (pi(N//p) - pi(p-1)) + |{composites in [2, N//p] with smpf >= p}|
# = S0c[N//p] - pi(p-1) - pi(p-1) + ... this is getting confused.
# Simpler: |{m in [2, N//p] : m is prime and m >= p, OR m is composite with smpf >= p}|
# = S0c[N//p] - pi(p-1)
# Because S0c[N//p] counts primes in [2, N//p] plus composites with smpf >= p.
# Primes in [2, p-1] number pi(p-1). The rest (primes >= p and composites with smpf >= p)
# is S0c[N//p] - pi(p-1).
# But we want m >= 2 with smpf(m) >= p. That's:
# = (primes in [p, N//p]) + (composites in [2, N//p] with smpf >= p)
# = S0c[N//p] - pi(p-1)
# Yes, this is correct.
# But for composites n = pm to satisfy smpf(n) = p, we need m >= 2.
# If m is a prime >= p, then n = pm is composite with smpf = p. Correct.
# If m is composite with smpf(m) = q >= p, then smpf(pm) = p. Correct.
# If m = 1, n = p which is prime, not composite. But m = 1 isn't in [2, N//p].
# However we also need to handle the multi-level nesting for higher powers of p.
# Actually for computing smpf sum, we just need:
# sum over composites n with smpf(n) = p of smpf(n) = p * (count of such composites)
# And the count of composites in [2, N] with smpf = p is exactly:
# |{m in [2, N//p] : smpf(m) >= p}| = S0c[N//p] - pi(p-1) = S0c[N//p] - p_count
# But wait, this only counts composites n = pm with m in [2, N//p].
# These include n = p*p, p*p*p, etc. (when m = p, p^2, ...).
# And n = p*q for prime q >= p, n = p*q*r, etc.
# All of these have smpf = p. So the count is correct.
# But S0c[N//p] - p_count over-counts:
# It counts m = p (prime) giving n = p^2 (composite, smpf = p) -- correct
# It counts m = q > p (prime) giving n = pq (composite, smpf = p) -- correct
# It counts m = p^2 (composite, smpf = p >= p) giving n = p^3 -- correct
# Good. But do we miss anything? n = pk where smpf(k) >= p and k >= 2 and pk <= N.
# That's exactly our set. Looks complete.
# Actually there's a subtlety for multi-level: n = p^2 * q has smpf = p,
# and is counted via m = p*q with smpf(m) = p >= p. Yes, correct.
# So total composite smpf contribution for prime p:
vp = N // p
if vp >= 2:
cnt_composites = S0c[vp] - p_count
# But we also need to count contributions from higher powers:
# n = p^a * m where a >= 1, smpf(m) >= p, m >= 1, and the FIRST p is what we want.
# Actually the above cnt_composites already handles all composites with smpf = p.
# For n = p^2, m = p, included. For n = p^3, m = p^2, included (smpf(p^2) = p >= p). Yes.
result = (result + p % MOD * (cnt_composites % MOD)) % MOD
# Now sieve: remove composites with smpf = p from S0c
p2 = p * p
for v in reversed(vals):
if v < p2:
break
S0c[v] -= (S0c[v // p] - p_count)
return result % MOD
print(solve(10**12))