Incomplete Words
In the theory of formal languages, a word is any finite sequence of letters from a given alphabet, and a word is called incomplete if it does not contain every letter of that alphabet. For example,...
Problem Statement
This archive keeps the full statement, math, and original media on the page.
In the context of formal languages, any finite sequence of letters of a given alphabet $\Sigma$ is called a word over $\Sigma$. We call a word incomplete if it does not contain every letter of $\Sigma$.
For example, using the alphabet $\Sigma=\{ a, b, c\}$, '$ab$', '$abab$' and '$\,$' (the empty word) are incomplete words over $\Sigma$, while '$abac$' is a complete word over $\Sigma$.
Given an alphabet $\Sigma$ of $\alpha$ letters, we define $I(\alpha,n)$ to be the number of incomplete words over $\Sigma$ with a length not exceeding $n$.
For example, $I(3,0)=1$, $I(3,2)=13$ and $I(3,4)=79$.
Find $I(10^7,10^{12})$. Give your answer modulo $1\,000\,000\,007$.
Problem 657: Incomplete Words
Mathematical Analysis
Counting Complete vs. Incomplete Words
The total number of words of length exactly over an alphabet of letters is .
The total number of words of length is:
Inclusion-Exclusion for Complete Words
A word of length is complete if it contains all letters. By inclusion-exclusion:
This is related to Stirling numbers of the second kind:
Number of Incomplete Words
Swapping the sums:
The term in the second sum gives , which cancels the first term, leaving:
For , the term , and we treat .
Simplification
Setting :
For : is treated as .
For : the term contributes .
Editorial
I(alpha, n) = sum_{j=1}^{alpha} (-1)^{j+1} * C(alpha,j) * geo(alpha-j, n) where geo(i, n) = (i^{n+1} - 1) / (i - 1). We compute the sum using modular arithmetic with . We then use Fermat’s little theorem for modular inverses. Finally, note: most terms with vanish since .
Pseudocode
Compute the sum using modular arithmetic with $p = 10^9 + 7$
Use Fermat's little theorem for modular inverses
Use modular exponentiation for $i^{n+1} \mod p$
Compute binomial coefficients $\binom{\alpha}{j}$ iteratively
Note: most terms with $i \geq p$ vanish since $i^{n+1} \equiv i^{(n+1) \mod (p-1)} \pmod{p}$
Correctness
Theorem. The method described above computes exactly the quantity requested in the problem statement.
Proof. The preceding analysis identifies the admissible objects and derives the formula, recurrence, or exhaustive search carried out by the algorithm. The computation evaluates exactly that specification, so every valid contribution is included once and no invalid contribution is counted. Therefore the returned value is the required answer.
Complexity Analysis
- Time: for modular exponentiations.
- Space: .
Answer
Code
Each problem page includes the exact C++ and Python source files from the local archive.
#include <bits/stdc++.h>
using namespace std;
const long long MOD = 1000000007;
long long power(long long base, long long exp, long long mod) {
long long result = 1;
base %= mod;
while (exp > 0) {
if (exp & 1) result = result * base % mod;
base = base * base % mod;
exp >>= 1;
}
return result;
}
long long mod_inv(long long a, long long mod) {
return power(a, mod - 2, mod);
}
int main() {
long long alpha = 10000000; // 10^7
long long n = 1000000000000LL; // 10^12
// I(alpha, n) = sum_{j=1}^{alpha} (-1)^{j+1} * C(alpha, j) * (geometric sum)
// where geometric sum = ((alpha-j)^{n+1} - 1) / ((alpha-j) - 1)
long long ans = 0;
long long binom = 1; // C(alpha, j), computed iteratively
for (long long j = 1; j <= alpha; j++) {
// Update binomial coefficient: C(alpha, j) = C(alpha, j-1) * (alpha - j + 1) / j
binom = binom % MOD * ((alpha - j + 1) % MOD) % MOD;
binom = binom % MOD * mod_inv(j % MOD, MOD) % MOD;
long long i = alpha - j; // the base of the geometric series
long long geo;
if (i == 0) {
// (0^{n+1} - 1) / (0 - 1) = 1
geo = 1;
} else if (i == 1) {
// (1^{n+1} - 1) / (1 - 1) -> n + 1
geo = (n + 1) % MOD;
} else {
// (i^{n+1} - 1) / (i - 1)
long long num = (power(i % MOD, n + 1, MOD) - 1 + MOD) % MOD;
long long den = (i % MOD - 1 + MOD) % MOD;
geo = num % MOD * mod_inv(den, MOD) % MOD;
}
long long term = binom % MOD * geo % MOD;
if (j % 2 == 1) {
// (-1)^{j+1} = +1 for odd j
ans = (ans + term) % MOD;
} else {
// (-1)^{j+1} = -1 for even j
ans = (ans - term + MOD) % MOD;
}
}
cout << ans << endl;
return 0;
}
"""
Project Euler Problem 657: Incomplete Words
Find I(10^7, 10^12) mod 10^9+7.
I(alpha, n) = sum_{j=1}^{alpha} (-1)^{j+1} * C(alpha,j) * geo(alpha-j, n)
where geo(i, n) = (i^{n+1} - 1) / (i - 1)
"""
MOD = 1000000007
def power(base, exp, mod):
result = 1
base %= mod
while exp > 0:
if exp & 1:
result = result * base % mod
base = base * base % mod
exp >>= 1
return result
def mod_inv(a, mod):
return power(a, mod - 2, mod)
def solve():
alpha = 10**7
n = 10**12
ans = 0
binom = 1 # C(alpha, j) mod MOD, built iteratively
for j in range(1, alpha + 1):
# C(alpha, j) = C(alpha, j-1) * (alpha - j + 1) / j
binom = binom * ((alpha - j + 1) % MOD) % MOD
binom = binom * mod_inv(j, MOD) % MOD
i = alpha - j # base of geometric series
if i == 0:
geo = 1
elif i == 1:
geo = (n + 1) % MOD
else:
num = (power(i, n + 1, MOD) - 1 + MOD) % MOD
den = (i % MOD - 1 + MOD) % MOD
geo = num * mod_inv(den, MOD) % MOD
term = binom * geo % MOD
if j % 2 == 1:
ans = (ans + term) % MOD
else:
ans = (ans - term + MOD) % MOD
print(ans)
if __name__ == "__main__":
solve()