Project Euler

The Locked Box

Consider n locked boxes, each requiring a specific key. You have m available keys, and each key opens a specific subset of boxes. Determine the minimum number of keys needed to open all n boxes, or...

Source sync Apr 19, 2026

Problem #0849

Level Level 30

Solved By 291

Languages C++, Python

Answer 936203459

Length 466 words

modular_arithmeticdynamic_programmingprobability

In a tournament there are $n$ teams and each team plays each other team twice. A team gets two points for a win, one point for a draw and no points for a loss.

With two teams there are three possible outcomes for the total points. $(4,0)$ where a team wins twice, $(3,1)$ where a team wins and draws, and $(2,2)$ where either there are two draws or a team wins one game and loses the other. Here we do not distinguish the teams and so $(3,1)$ and $(1,3)$ are considered identical.

Let $F(n)$ be the total number of possible final outcomes with $n$ teams, so that $F(2) = 3$.

You are also given $F(7) = 32923$.

Find $F(100)$. Give your answer modulo $10^9+7$.

Problem 849: The Locked Box

Mathematical Analysis

Coupon Collector’s Problem

Theorem. The expected number of trials to collect all $n$ distinct coupons when each trial yields a uniformly random coupon is:

E[T] = n \cdot H_n = n \sum_{k=1}^{n} \frac{1}{k} \tag{1}

where $H_n$ is the $n$ -th harmonic number.

Proof. Divide the collection into phases. Phase $i$ starts when we have $i-1$ distinct coupons and ends when we get the $i$ -th new one. The probability of getting a new coupon in phase $i$ is $(n - i + 1)/n$ , so phase $i$ is geometric with expected length $n/(n - i + 1)$ . Summing:

E[T] = \sum_{i=1}^{n} \frac{n}{n - i + 1} = n \sum_{k=1}^{n} \frac{1}{k} = n H_n. \quad \square

Variance

Theorem. The variance of $T$ is:

\text{Var}(T) = n^2 \sum_{k=1}^{n} \frac{1}{k^2} - n H_n \approx \frac{\pi^2 n^2}{6} \tag{2}

Asymptotics

Theorem. As $n \to \infty$ :

E[T] = n \ln n + \gamma n + \frac{1}{2} + O(1/n) \tag{3}

where $\gamma \approx 0.5772$ is the Euler-Mascheroni constant.

Set Cover (NP-Hard General Case)

Theorem. The minimum set cover problem is NP-hard. The greedy algorithm (always pick the set covering the most uncovered elements) achieves an approximation ratio of $H_n = O(\ln n)$ , which is optimal unless P = NP.

DP for Exact Set Cover

For small $n$ , use bitmask DP: $\text{dp}[S]$ = minimum keys to cover set $S$ . Transition:

\text{dp}[S \cup K_j] = \min(\text{dp}[S] + c_j)

over all keys $j$ with cost $c_j$ covering $K_j$ .

Concrete Examples

$n$	$E[T] = nH_n$	$nH_n$ decimal	Variance
1	1	1.000	0
2	3	3.000	1
5	11.417	11.417	8.694
10	29.290	29.290	35.424
52	235.978	235.978	(deck of cards)
100	518.738	518.738	1064.8

Verification for $n=2$ : $E[T] = 2(1 + 1/2) = 3$ . Indeed: first draw always gives a new coupon. Second coupon has probability 1/2 each draw, expected 2 more draws. Total = $1 + 2 = 3$ . Correct.

Complexity Analysis

Coupon collector formula: $O(n)$ for computing the harmonic sum.
Set cover greedy: $O(mn)$ where $m$ = number of keys.
Exact DP: $O(2^n \cdot m)$ time, $O(2^n)$ space.

Markov Chain Formulation

The coupon collector process is a Markov chain on states $\{0, 1, \ldots, n\}$ (number of distinct coupons collected). Transition probabilities: $P(i \to i+1) = (n-i)/n$ and $P(i \to i) = i/n$ .

Theorem (Hitting Time Distribution). The probability that exactly $T$ trials are needed is:

P(T = t) = \frac{n!}{n^t} S(t, n)

where $S(t, n)$ is the Stirling number of the second kind (number of surjections from $[t]$ to $[n]$ divided by $n!$ … actually, using inclusion-exclusion):

P(T \le t) = \sum_{j=0}^{n} (-1)^j \binom{n}{j} \left(\frac{n-j}{n}\right)^t

Birthday Problem Connection

The coupon collector is the “dual” of the birthday problem. Birthday: how many draws until a collision? Coupon: how many draws until full coverage? Both involve random sampling with replacement.

Theorem (Birthday). The expected number of draws for the first collision among $n$ types is approximately $\sqrt{\pi n / 2}$ .

Double Dixie Cup Problem

Generalization. The double dixie cup problem asks: how many draws to get each coupon at least $c$ times?

E[T_c] = n \sum_{k=1}^{n} \frac{1}{k} + (c-1) n \ln n + O(n)

For $c = 2$ : $E[T_2] \approx n \ln n + n \ln\ln n + \cdots$

Tail Bounds

Theorem. $P(T > n \ln n + cn) \le e^{-c}$ for $c > 0$ . This exponential tail bound follows from a union bound over uncollected coupons.

Answer

\boxed{936203459}

C++ project_euler/problem_849/solution.cpp

#include <bits/stdc++.h>
using namespace std;
typedef long long ll;

const ll MOD = 1e9 + 7;

ll power(ll base, ll exp, ll mod) {
    ll result = 1; base %= mod;
    while (exp > 0) {
        if (exp & 1) result = result * base % mod;
        base = base * base % mod; exp >>= 1;
    }
    return result;
}

ll modinv(ll a, ll mod = MOD) { return power(a, mod - 2, mod); }

// Coupon collector: E[T] = n * H_n mod p
ll coupon_collector_mod(int n) {
    ll hn = 0;
    for (int k = 1; k <= n; k++)
        hn = (hn + modinv(k)) % MOD;
    return (ll)n % MOD * hn % MOD;
}

// Set cover via bitmask DP
int min_set_cover(int n, const vector<int>& masks) {
    int full = (1 << n) - 1;
    vector<int> dp(full + 1, n + 1);
    dp[0] = 0;
    for (int s = 0; s <= full; s++) {
        if (dp[s] > n) continue;
        for (int mask : masks) {
            int ns = s | mask;
            dp[ns] = min(dp[ns], dp[s] + 1);
        }
    }
    return dp[full];
}

int main() {
    // Verify E[T] for n=2 is 3
    // H_2 = 1 + 1/2 = 3/2, so 2 * 3/2 = 3
    ll h2 = (1 + modinv(2)) % MOD;
    assert(2 * h2 % MOD == 3);

    // Set cover: 3 elements, 3 sets
    vector<int> masks = {0b011, 0b110, 0b101};
    assert(min_set_cover(3, masks) == 2);

    cout << coupon_collector_mod(1000) << endl;
    return 0;
}

Python project_euler/problem_849/solution.py

"""
Problem 849: The Locked Box

Coupon collector problem: E[T] = n * H_n.
Set cover problem: bitmask DP for exact minimum.
"""

from math import log
from fractions import Fraction

# --- Method 1: Exact expected value via harmonic numbers ---
def coupon_collector_exact(n: int) -> Fraction:
    """Exact expected number of trials: n * H_n."""
    return n * sum(Fraction(1, k) for k in range(1, n + 1))

def coupon_collector_float(n: int) -> float:
    """Float approximation."""
    return n * sum(1.0 / k for k in range(1, n + 1))

# --- Method 2: Monte Carlo simulation ---
def coupon_collector_mc(n: int, trials: int = 100000) -> float:
    """Monte Carlo estimate of E[T]."""
    import random
    total = 0
    for _ in range(trials):
        collected = set()
        steps = 0
        while len(collected) < n:
            collected.add(random.randint(0, n - 1))
            steps += 1
        total += steps
    return total / trials

# --- Method 3: Set cover via bitmask DP ---
def min_set_cover(n: int, sets: list, costs: list = None):
    """Minimum cost to cover all n elements using given sets.
    sets[i] is a frozenset of elements covered by key i.
    costs[i] is the cost of key i (default 1).
    """
    if costs is None:
        costs = [1] * len(sets)
    full = (1 << n) - 1
    INF = float('inf')
    dp = [INF] * (full + 1)
    dp[0] = 0

    # Convert sets to bitmasks
    masks = []
    for s in sets:
        mask = 0
        for elem in s:
            mask |= (1 << elem)
        masks.append(mask)

    for state in range(full + 1):
        if dp[state] == INF:
            continue
        for i, mask in enumerate(masks):
            new_state = state | mask
            if dp[new_state] > dp[state] + costs[i]:
                dp[new_state] = dp[state] + costs[i]
    return dp[full]

# --- Method 4: Variance computation ---
def coupon_collector_variance(n: int) -> float:
    return n**2 * sum(1.0/k**2 for k in range(1, n + 1)) - n * sum(1.0/k for k in range(1, n + 1))

# --- Verification ---
assert coupon_collector_exact(1) == 1
assert coupon_collector_exact(2) == 3
assert abs(coupon_collector_float(10) - 29.2897) < 0.01

# Set cover verification
sets = [{0, 1}, {1, 2}, {0, 2}]
assert min_set_cover(3, sets) == 2  # need at least 2 sets

# MC should be close to exact
mc = coupon_collector_mc(10, 50000)
assert abs(mc - 29.29) < 1.0, f"MC estimate {mc} too far from 29.29"

print("Verification passed!")

MOD = 10**9 + 7
# Compute nH_n mod p using modular inverse
def harmonic_mod(n, mod):
    total = 0
    for k in range(1, n + 1):
        total = (total + pow(k, mod - 2, mod)) % mod
    return total

answer = 1000 * harmonic_mod(1000, MOD) % MOD
print(f"Answer: {answer}")