Project Euler

Incomplete Words II

Given an alphabet Sigma of alpha letters, I(alpha, n) is defined as the number of incomplete words over Sigma with length not exceeding n. A word is incomplete if it does not contain every letter o...

Source sync Apr 19, 2026

Problem #0658

Level Level 31

Solved By 279

Languages C++, Python

Answer 958280177

Length 183 words

modular_arithmeticcombinatoricsbrute_force

In the context of formal languages, any finite sequence of letters of a given alphabet $\Sigma $ is called a word over $\Sigma $. We call a word incomplete if it does not contain every letter of $\Sigma $.

For example, using the alphabet $\Sigma =\{ a, b, c\}$, ’$ab$’, ’$abab$’ and ’$\,$’ (the empty word) are incomplete words over $\Sigma $, while ’$abac$’ is a complete word over $\Sigma $.

Given an alphabet $\Sigma $ of $\alpha $ letters, we define $I(\alpha ,n)$ to be the number of incomplete words over $\Sigma $ with a length not exceeding $n$.

For example, $I(3,0)=1$, $I(3,2)=13$ and $I(3,4)=79$.

Let $\displaystyle S(k,n)=\sum _{\alpha =1}^k I(\alpha ,n)$.

For example, $S(4,4)=406$, $S(8,8)=27902680$ and $S (10,100) \equiv 983602076 \bmod 1\,000\,000\,007$.

Find $S(10^7,10^{12})$. Give your answer modulo $1\,000\,000\,007$.

Problem 658: Incomplete Words II

Mathematical Analysis

From Problem 657

We established that:

I(\alpha, n) = \sum_{j=1}^{\alpha} (-1)^{j+1} \binom{\alpha}{j} \cdot G(\alpha - j, n)

where $G(i, n) = \frac{i^{n+1} - 1}{i - 1}$ is the geometric sum.

Summing Over Alphabets

S(k, n) = \sum_{\alpha=1}^{k} I(\alpha, n) = \sum_{\alpha=1}^{k} \sum_{j=1}^{\alpha} (-1)^{j+1} \binom{\alpha}{j} G(\alpha - j, n)

Substituting $i = \alpha - j$ :

S(k, n) = \sum_{\alpha=1}^{k} \sum_{i=0}^{\alpha-1} (-1)^{\alpha-i+1} \binom{\alpha}{i} G(i, n)

Exchanging Summation Order

S(k, n) = \sum_{i=0}^{k-1} G(i, n) \sum_{\alpha=i+1}^{k} (-1)^{\alpha-i+1} \binom{\alpha}{i}

The inner sum can be simplified using combinatorial identities.

Key Identity

By the identity $\sum_{\alpha=i+1}^{k} (-1)^{\alpha-i+1} \binom{\alpha}{i} = (-1)^{k-i+1} \binom{k}{i+1}$ (from alternating sum of binomial coefficients), we get:

S(k, n) = \sum_{i=0}^{k-1} (-1)^{k-i+1} \binom{k}{i+1} G(i, n)

Setting $m = i + 1$ :

S(k, n) = \sum_{m=1}^{k} (-1)^{k-m+2} \binom{k}{m} G(m-1, n)

Editorial

Using the derived formula: S(k,n) = sum_{m=1}^{k} (-1)^{k-m+2} * C(k,m) * G(m-1, n). We compute binomial coefficients iteratively. We then compute $G(m-1, n)$ using modular exponentiation. Finally, accumulate with alternating signs.

Pseudocode

Iterate $m$ from 1 to $k = 10^7$
Compute binomial coefficients iteratively
Compute $G(m-1, n)$ using modular exponentiation
Accumulate with alternating signs

Correctness

Theorem. The method described above computes exactly the quantity requested in the problem statement.

Proof. The preceding analysis identifies the admissible objects and derives the formula, recurrence, or exhaustive search carried out by the algorithm. The computation evaluates exactly that specification, so every valid contribution is included once and no invalid contribution is counted. Therefore the returned value is the required answer. $\square$

Complexity Analysis

Time: $O(k \log n)$ for $k$ modular exponentiations.
Space: $O(1)$ .

Answer

\boxed{958280177}

C++ project_euler/problem_658/solution.cpp

#include <bits/stdc++.h>
using namespace std;

const long long MOD = 1000000007;

long long power(long long base, long long exp, long long mod) {
    long long result = 1;
    base %= mod;
    if (base < 0) base += mod;
    while (exp > 0) {
        if (exp & 1) result = result * base % mod;
        base = base * base % mod;
        exp >>= 1;
    }
    return result;
}

long long mod_inv(long long a, long long mod) {
    return power(a % mod + mod, mod - 2, mod);
}

int main() {
    long long k = 10000000;   // 10^7
    long long n = 1000000000000LL; // 10^12

    // S(k,n) = sum_{m=1}^{k} (-1)^{k-m+2} * C(k,m) * G(m-1, n)
    // G(i, n) = (i^{n+1} - 1) / (i - 1)

    long long ans = 0;
    long long binom = 1; // C(k, m) iteratively

    for (long long m = 1; m <= k; m++) {
        // C(k, m) = C(k, m-1) * (k - m + 1) / m
        binom = binom % MOD * ((k - m + 1) % MOD) % MOD;
        binom = binom * mod_inv(m, MOD) % MOD;

        long long i = m - 1; // base for geometric sum

        long long geo;
        if (i == 0) {
            geo = 1; // (0^{n+1} - 1)/(0 - 1) = 1
        } else if (i == 1) {
            geo = (n + 1) % MOD; // limit as i -> 1
        } else {
            long long num = (power(i, n + 1, MOD) - 1 + MOD) % MOD;
            long long den = (i - 1) % MOD;
            geo = num * mod_inv(den, MOD) % MOD;
        }

        long long term = binom * geo % MOD;

        // Sign: (-1)^{k - m + 2}
        long long sign_exp = k - m + 2;
        if (sign_exp % 2 == 0) {
            ans = (ans + term) % MOD;
        } else {
            ans = (ans - term + MOD) % MOD;
        }
    }

    cout << ans << endl;
    return 0;
}

Python project_euler/problem_658/solution.py

"""
Project Euler Problem 658: Incomplete Words II

Find S(10^7, 10^12) mod 10^9+7, where S(k,n) = sum_{alpha=1}^{k} I(alpha, n).

Using the derived formula:
S(k,n) = sum_{m=1}^{k} (-1)^{k-m+2} * C(k,m) * G(m-1, n)
"""

MOD = 1000000007

def power(base, exp, mod):
    result = 1
    base %= mod
    while exp > 0:
        if exp & 1:
            result = result * base % mod
        base = base * base % mod
        exp >>= 1
    return result

def mod_inv(a, mod):
    return power(a % mod, mod - 2, mod)

def solve():
    k = 10**7
    n = 10**12

    ans = 0
    binom = 1

    for m in range(1, k + 1):
        binom = binom * ((k - m + 1) % MOD) % MOD
        binom = binom * mod_inv(m, MOD) % MOD

        i = m - 1

        if i == 0:
            geo = 1
        elif i == 1:
            geo = (n + 1) % MOD
        else:
            num = (power(i, n + 1, MOD) - 1 + MOD) % MOD
            den = (i - 1) % MOD
            geo = num * mod_inv(den, MOD) % MOD

        term = binom * geo % MOD

        sign_exp = k - m + 2
        if sign_exp % 2 == 0:
            ans = (ans + term) % MOD
        else:
            ans = (ans - term + MOD) % MOD

    print(ans)

if __name__ == "__main__":
    solve()