Superinteger
An integer s is a superinteger of another integer n if the digits of n form a subsequence of the digits of s. Let p(i) be the i -th prime, c(i) the i -th composite. Define sd(n) = 1 + (n-1) mod 9 (...
Problem Statement
This archive keeps the full statement, math, and original media on the page.
An integer $s$ is called superinteger of another integer $n$ if the digits of $n$ form a subsequence of the digits of $s$.
For example, $2718281828$ is a superinteger of $18828$, while $314159$ is not a superinteger of $151$.
Let $p(n)$ be the $n$th prime number, and let $c(n)$ be the $n$th composite number. For example, $p(1) = 2$, $p(10) = 29$, $c(1) = 4$ and $c(10) = 18$.
$\{p(i) : i \ge 1\} = \{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, \dots\}$
$\{c(i) : i \ge 1\} = \{4, 6, 8, 9, 10, 12, 14, 15, 16, 18, \dots\}$
Let $P^D$ be the sequence of the digital roots of $\{p(i)\}$ ($C^D$ is defined similarly for $\{c(i)\}$):
$P^D = \{2, 3, 5, 7, 2, 4, 8, 1, 5, 2, \dots\}$
$C^D = \{4, 6, 8, 9, 1, 3, 5, 6, 7, 9, \dots\}$
Let $P_n$ be the integer formed by concatenating the first $n$ elements of $P^D$ ($C_n$ is defined similarly for $C^D$).
$P_{10} = 2357248152$
$C_{10} = 4689135679$
Let $f(n)$ be the smallest positive integer that is a common superinteger of $P_n$ and $C_n$.
For example, $f(10) = 2357246891352679$ and $f(100) \bmod 1\,000\,000\,007 = 771661825$.
Find $f(10\,000) \bmod 1\,000\,000\,007$.
Problem 467: Superinteger
Mathematical Foundation
Theorem 1 (Digital root formula). For , .
Proof. The digital root satisfies with . For : if then (since ); otherwise .
Theorem 2 (SCS via LCS). For strings and over alphabet , the length of the Shortest Common Supersequence satisfies:
where is the Longest Common Subsequence.
Proof. An SCS of and is a string of minimum length containing both and as subsequences. Any common subsequence of and can be “shared” in , and an LCS provides the maximum such sharing. Formally, align and to : each position of is either from only, only, or shared. The shared positions form a common subsequence, and to minimize we maximize the shared portion. Thus .
Lemma 1 (LCS recurrence). For sequences and :
The LCS length is .
Proof. Standard dynamic programming recurrence. If , the optimal LCS either uses this match (gaining 1 over ) or does not, but using it is always at least as good. If , at least one of or is not in the LCS suffix, giving the max of the two sub-cases.
Theorem 3 (SCS construction and modular evaluation). The SCS string is constructed by backtracking through the LCS DP table, interleaving characters from and while sharing matched characters. The numeric value modulo is computed incrementally:
for each digit of the SCS.
Proof. The backtracking produces the lexicographically smallest SCS (by choosing ‘s character first when tied). The modular arithmetic preserves the polynomial identity via Horner’s method.
Editorial
The key algorithm: lexicographically smallest SCS via forward construction using suffix LCS tables. We generate sequences. We then lCS DP table (O(n^2)). Finally, else.
Pseudocode
Generate sequences
LCS DP table (O(n^2))
else
Backtrack to construct SCS, compute value mod MOD
Build SCS in reverse, then reverse at end
else
Compute numeric value mod MOD
for d in scs
Complexity Analysis
- Time: for sieving primes up to the -th prime (). for the LCS DP table. for SCS construction and modular evaluation. Total: .
- Space: for the LCS table (can be reduced to with Hirschberg’s algorithm if only the value is needed, but backtracking requires the full table).
Answer
Code
Each problem page includes the exact C++ and Python source files from the local archive.
/*
* Project Euler Problem 467: Superinteger
*
* Find f(10000) mod 1,000,000,007 where f(n) is the smallest superinteger
* (shortest common supersequence) of the digital root sequences of the
* first n primes and first n composites.
*
* Algorithm:
* 1. Generate digital root sequences P_n and C_n
* 2. Compute suffix LCS table: lcs[i][j] = LCS(A[i:], B[j:])
* 3. Build lexicographically smallest SCS forward using greedy:
* At state (i,j), next char must be A[i] or B[j] (or both if equal).
* Pick the option that keeps SCS length optimal, break ties by
* choosing the smaller digit.
* 4. Compute numeric value mod 10^9+7 during construction.
*
* Compile: g++ -O2 -o solution solution.cpp
* Memory: ~400MB for n=10000 (short array 10001x10001)
*/
#include <iostream>
#include <vector>
#include <algorithm>
#include <cstring>
using namespace std;
const long long MOD = 1000000007LL;
const int N = 10000;
const int SIEVE_LIMIT = 200000;
bool is_prime_arr[SIEVE_LIMIT + 1];
int digital_root(int n) {
if (n == 0) return 0;
return 1 + (n - 1) % 9;
}
void sieve() {
memset(is_prime_arr, true, sizeof(is_prime_arr));
is_prime_arr[0] = is_prime_arr[1] = false;
for (int i = 2; i * i <= SIEVE_LIMIT; i++) {
if (is_prime_arr[i]) {
for (int j = i * i; j <= SIEVE_LIMIT; j += i) {
is_prime_arr[j] = false;
}
}
}
}
// Suffix LCS table: lcs[i][j] = LCS(A[i:], B[j:])
short lcs_table[N + 1][N + 1];
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
sieve();
// Generate sequences
vector<int> A, B; // A = sp (primes), B = sc (composites)
for (int i = 2; i <= SIEVE_LIMIT && ((int)A.size() < N || (int)B.size() < N); i++) {
if (is_prime_arr[i]) {
if ((int)A.size() < N) A.push_back(digital_root(i));
} else {
if ((int)B.size() < N) B.push_back(digital_root(i));
}
}
int m = A.size(), n = B.size();
cout << "Generated sequences: m=" << m << ", n=" << n << endl;
cout << "P starts: ";
for (int i = 0; i < 20; i++) cout << A[i];
cout << endl;
cout << "C starts: ";
for (int i = 0; i < 20; i++) cout << B[i];
cout << endl;
// Compute suffix LCS
cout << "Computing suffix LCS table..." << endl;
for (int i = 0; i <= m; i++) lcs_table[i][n] = 0;
for (int j = 0; j <= n; j++) lcs_table[m][j] = 0;
for (int i = m - 1; i >= 0; i--) {
for (int j = n - 1; j >= 0; j--) {
if (A[i] == B[j]) {
lcs_table[i][j] = lcs_table[i+1][j+1] + 1;
} else {
lcs_table[i][j] = max(lcs_table[i+1][j], lcs_table[i][j+1]);
}
}
}
int lcs_len = lcs_table[0][0];
int scs_len = m + n - lcs_len;
cout << "LCS length: " << lcs_len << endl;
cout << "SCS length: " << scs_len << endl;
// Build lexicographically smallest SCS forward
cout << "Building SCS..." << endl;
long long val = 0;
int ci = 0, cj = 0;
auto scs_rem = [&](int i, int j) -> int {
return (m - i) + (n - j) - lcs_table[i][j];
};
while (ci < m || cj < n) {
int d;
if (ci == m) {
d = B[cj]; cj++;
} else if (cj == n) {
d = A[ci]; ci++;
} else if (A[ci] == B[cj]) {
d = A[ci]; ci++; cj++;
} else {
int cur = scs_rem(ci, cj);
bool opt_a = (1 + scs_rem(ci + 1, cj) == cur);
bool opt_b = (1 + scs_rem(ci, cj + 1) == cur);
if (opt_a && opt_b) {
if (A[ci] <= B[cj]) {
d = A[ci]; ci++;
} else {
d = B[cj]; cj++;
}
} else if (opt_a) {
d = A[ci]; ci++;
} else {
d = B[cj]; cj++;
}
}
val = (val * 10 + d) % MOD;
}
cout << endl;
cout << "f(" << N << ") mod " << MOD << " = " << val << endl;
// Verify small case n=10
{
cout << "\n=== Verification n=10 ===" << endl;
vector<int> a10(A.begin(), A.begin() + 10);
vector<int> b10(B.begin(), B.begin() + 10);
short lcs10[11][11];
memset(lcs10, 0, sizeof(lcs10));
for (int i = 9; i >= 0; i--)
for (int j = 9; j >= 0; j--) {
if (a10[i] == b10[j])
lcs10[i][j] = lcs10[i+1][j+1] + 1;
else
lcs10[i][j] = max(lcs10[i+1][j], lcs10[i][j+1]);
}
cout << "LCS(10) = " << lcs10[0][0] << endl;
cout << "SCS(10) = " << 20 - lcs10[0][0] << endl;
auto scs_rem10 = [&](int i, int j) -> int {
return (10 - i) + (10 - j) - lcs10[i][j];
};
string scs_str;
int ii = 0, jj = 0;
while (ii < 10 || jj < 10) {
int dd;
if (ii == 10) { dd = b10[jj]; jj++; }
else if (jj == 10) { dd = a10[ii]; ii++; }
else if (a10[ii] == b10[jj]) { dd = a10[ii]; ii++; jj++; }
else {
int cur = scs_rem10(ii, jj);
bool oa = (1 + scs_rem10(ii+1, jj) == cur);
bool ob = (1 + scs_rem10(ii, jj+1) == cur);
if (oa && ob) {
if (a10[ii] <= b10[jj]) { dd = a10[ii]; ii++; }
else { dd = b10[jj]; jj++; }
} else if (oa) { dd = a10[ii]; ii++; }
else { dd = b10[jj]; jj++; }
}
scs_str += ('0' + dd);
}
cout << "f(10) = " << scs_str << endl;
cout << "Expected: 2357246891352679" << endl;
long long v = 0;
for (char c : scs_str) v = (v * 10 + (c - '0')) % MOD;
cout << "f(10) mod " << MOD << " = " << v << endl;
}
return 0;
}
"""
Project Euler Problem 467: Superinteger
Find f(10000) mod 1,000,000,007 where f(n) is the smallest superinteger
(shortest common supersequence as a number) of the digital root sequences
of the first n primes and first n composites.
The key algorithm: lexicographically smallest SCS via forward construction
using suffix LCS tables.
"""
import sys
MOD = 1_000_000_007
def digital_root(n):
if n == 0:
return 0
return 1 + (n - 1) % 9
def sieve_primes(limit):
is_prime = [True] * (limit + 1)
is_prime[0] = is_prime[1] = False
for i in range(2, int(limit**0.5) + 1):
if is_prime[i]:
for j in range(i*i, limit + 1, i):
is_prime[j] = False
return is_prime
def generate_sequences(n):
limit = max(15 * n, 200000)
is_prime = sieve_primes(limit)
primes = []
composites = []
for i in range(2, limit + 1):
if is_prime[i]:
primes.append(i)
else:
composites.append(i)
if len(primes) >= n and len(composites) >= n:
break
sp = [digital_root(primes[i]) for i in range(n)]
sc = [digital_root(composites[i]) for i in range(n)]
return sp, sc
def solve(n, verbose=False):
"""
Solve for f(n) mod MOD.
The SCS of two sequences A, B can be built forward using:
- suffix LCS table: lcs[i][j] = LCS length of A[i:] and B[j:]
- scs_rem(i,j) = (m-i) + (n-j) - lcs[i][j]
To build the lexicographically smallest SCS, at each step from state (i,j)
we must choose the next character. The next character MUST be either A[i]
or B[j] (or both if they're equal), because:
- The SCS must contain all of A and all of B as subsequences
- If the next character were neither A[i] nor B[j], we'd need to still
fit A[i:] and B[j:] as subsequences of the remaining SCS, plus we used
one extra character, making it longer than optimal.
So at state (i,j):
- If i == m: must take B[j], move to (i, j+1)
- If j == n: must take A[i], move to (i+1, j)
- If A[i] == B[j]: take that digit, move to (i+1, j+1)
- Else: we MUST take either A[i] or B[j] (whichever keeps SCS shortest).
If both keep it shortest, pick the smaller digit.
If same digit value, need deeper tie-breaking.
"""
sp, sc = generate_sequences(n)
A, B = sp, sc
m, nn = len(A), len(B)
if verbose:
print(f"Generated sequences of length {n}")
print(f"P_{n} starts with: {''.join(map(str, A[:20]))}")
print(f"C_{n} starts with: {''.join(map(str, B[:20]))}")
# Compute suffix LCS table
lcs = [[0] * (nn + 1) for _ in range(m + 1)]
for i in range(m - 1, -1, -1):
for j in range(nn - 1, -1, -1):
if A[i] == B[j]:
lcs[i][j] = lcs[i+1][j+1] + 1
else:
lcs[i][j] = max(lcs[i+1][j], lcs[i][j+1])
def scs_rem(i, j):
return (m - i) + (nn - j) - lcs[i][j]
total_len = scs_rem(0, 0)
lcs_len = lcs[0][0]
if verbose:
print(f"LCS length: {lcs_len}")
print(f"SCS length: {total_len}")
# Build SCS forward
val = 0
i, j = 0, 0
scs_digits = []
while i < m or j < nn:
if i == m:
d = B[j]
j += 1
elif j == nn:
d = A[i]
i += 1
elif A[i] == B[j]:
d = A[i]
i += 1
j += 1
else:
# Try taking A[i] vs B[j]
# Taking A[i]: new state (i+1, j), new scs_rem = scs_rem(i+1, j)
# Taking B[j]: new state (i, j+1), new scs_rem = scs_rem(i, j+1)
# Current scs_rem = scs_rem(i, j)
# Valid option needs: 1 + scs_rem(new) == scs_rem(i, j)
rem_a = scs_rem(i + 1, j)
rem_b = scs_rem(i, j + 1)
cur = scs_rem(i, j)
opt_a = (1 + rem_a == cur)
opt_b = (1 + rem_b == cur)
if opt_a and opt_b:
# Both valid, pick smaller digit
if A[i] < B[j]:
d = A[i]
i += 1
elif B[j] < A[i]:
d = B[j]
j += 1
else:
# Same digit but A[i] == B[j] -- handled above, shouldn't reach here
d = A[i]
i += 1
elif opt_a:
d = A[i]
i += 1
elif opt_b:
d = B[j]
j += 1
else:
# This shouldn't happen
print(f"ERROR: no valid option at ({i},{j})")
break
scs_digits.append(d)
val = (val * 10 + d) % MOD
if verbose:
print(f"f({n}) mod {MOD} = {val}")
return val, scs_digits, A, B, lcs_len
def create_visualization(sp, sc, lcs_len, n):
"""Return a compact preview of the generated sequences."""
preview = min(25, n)
return {
"n": n,
"lcs_length": lcs_len,
"prime_prefix": "".join(map(str, sp[:preview])),
"composite_prefix": "".join(map(str, sc[:preview])),
}
def parse_args(argv):
n = 10000
verbose = False
for arg in argv:
if arg == "--verbose":
verbose = True
continue
n = int(arg)
return n, verbose
def main(argv=None):
n, verbose = parse_args(sys.argv[1:] if argv is None else argv)
value, _, _, _, _ = solve(n, verbose=verbose)
print(value)
if __name__ == "__main__":
main()