All Euler problems
Project Euler

Building a Tower

Let f(n) denote the number of ways to fill a 3 x 3 x n tower using 2 x 1 x 1 bricks. Find f(10^10000) mod (10^8 + 7).

Source sync Apr 19, 2026
Problem #0324
Level Level 17
Solved By 845
Languages C++, Python
Answer 96972774
Length 368 words
modular_arithmeticlinear_algebraalgebra

Problem Statement

This archive keeps the full statement, math, and original media on the page.

Let \(f(n)\) represent the number of ways one can fill a \(3 \times 3 \times n\) tower with blocks of \(2 \times 1 \times 1\).

You’re allowed to rotate the blocks in any way you like; however, rotations, reflections etc of the tower itself are counted as distinct.

For example (with \(q = 100000007\)):

\(f(2) = 229\),

\(f(4) = 117805\),

\(f(10) \bmod q = 96149360\),

\(f(10^3) \bmod q = 24806056\),

\(f(10^6) \bmod q = 30808124\).

Find \(f(10^{10000}) \bmod 100000007\).

Problem 324: Building a Tower

Mathematical Foundation

Theorem 1 (Transfer matrix). Let S={0,1}9\mathcal{S} = \{0,1\}^9 be the set of all bitmasks of the 3×33 \times 3 cross-section. There exists a 512×512512 \times 512 matrix TT over Z\mathbb{Z} such that

f(n)=(Tn)0,0f(n) = (T^n)_{0,0}

where state 00 represents the empty profile (no protruding bricks).

Proof. We build the tower layer by layer. A profile sSs \in \mathcal{S} records which of the 9 cells at the boundary between two consecutive layers are occupied by bricks protruding from below. Entry T[s1][s2]T[s_1][s_2] counts the number of valid placements of 2×1×12 \times 1 \times 1 bricks within a single layer such that all empty cells of profile s1s_1 are filled, and the cells protruding upward form profile s2s_2. The brick placements within a layer correspond to horizontal dominoes (within-row or within-column) and vertical bricks crossing the boundary. Since the starting and ending profiles are both empty (no protrusions at the bottom or top), f(n)=(Tn)0,0f(n) = (T^n)_{0,0}. \square

Lemma 1 (Linear recurrence). The sequence f(n)f(n) satisfies a linear recurrence

f(n)=c1f(n1)+c2f(n2)++cdf(nd)f(n) = c_1 f(n-1) + c_2 f(n-2) + \cdots + c_d f(n-d)

of degree dd equal to the degree of the minimal polynomial of TT restricted to the cyclic subspace generated by e0e_0 (the unit vector for state 0).

Proof. The Cayley—Hamilton theorem guarantees that TT satisfies its characteristic polynomial of degree at most 512. The sequence (Tn)0,0=e0TTne0(T^n)_{0,0} = e_0^T T^n e_0 therefore satisfies the corresponding recurrence. The minimal polynomial of the sequence has degree at most d512d \leq 512. \square

Theorem 2 (Polynomial exponentiation for large nn). Let m(x)m(x) be the minimal polynomial of the sequence {f(n)}\{f(n)\} of degree dd, working modulo the prime p=108+7p = 10^8 + 7. Then

f(n)i=0d1aif(i)(modp)f(n) \equiv \sum_{i=0}^{d-1} a_i \, f(i) \pmod{p}

where xni=0d1aixi(modm(x))x^n \equiv \sum_{i=0}^{d-1} a_i x^i \pmod{m(x)} in Fp[x]\mathbb{F}_p[x], computed via repeated squaring.

Proof. Since f(n)=e0TTne0f(n) = e_0^T T^n e_0 and m(T)m(T) annihilates the relevant subspace, any polynomial identity xnr(x)(modm(x))x^n \equiv r(x) \pmod{m(x)} translates directly to Tnr(T)T^n \equiv r(T) on that subspace, hence f(n)=aif(i)f(n) = \sum a_i f(i). The polynomial xnmodm(x)x^n \bmod m(x) is computed via O(logn)O(\log n) squarings in Fp[x]/(m(x))\mathbb{F}_p[x] / (m(x)). \square

Lemma 2 (Berlekamp—Massey). Given f(0),f(1),,f(2d1)f(0), f(1), \ldots, f(2d-1), the Berlekamp—Massey algorithm recovers the minimal polynomial m(x)m(x) of degree dd in O(d2)O(d^2) operations over Fp\mathbb{F}_p.

Proof. Standard result; see Massey (1969). \square

Editorial

3x3xn tower with 2x1x1 bricks. Find f(10^10000) mod (10^8 + 7). Uses transfer matrix + Berlekamp-Massey + polynomial exponentiation. We build transfer matrix T (512 x 512). We then compute f(0), …, f(2*512) by repeated matrix-vector multiplication. Finally, find minimal polynomial via Berlekamp-Massey.

Pseudocode

Build transfer matrix T (512 x 512)
Compute f(0), ..., f(2*512) by repeated matrix-vector multiplication
Find minimal polynomial via Berlekamp-Massey
Compute x^(10^10000) mod m(x) in F_p[x]
Represent n = 10^10000 in binary (~33220 bits)
Use repeated squaring of polynomials mod m(x)
Evaluate f(n) = sum of a_i * f(i)

Complexity Analysis

  • Time: O(2929E)O(2^9 \cdot 2^9 \cdot E) for transfer matrix construction (where EE is the cost of enumerating tilings per state pair) + O(d2)O(d^2) for Berlekamp—Massey + O(d2logn)O(d^2 \log n) for polynomial exponentiation, where d512d \leq 512 and log2(1010000)33220\log_2(10^{10000}) \approx 33220. Dominant term: O(d2logn)O(d^2 \log n).
  • Space: O(d2)O(d^2) for the transfer matrix or O(d)O(d) for the polynomial representation.

Answer

96972774\boxed{96972774}

Code

Each problem page includes the exact C++ and Python source files from the local archive.

C++ project_euler/problem_324/solution.cpp
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const ll MOD = 100000007LL;

// Transfer matrix (sparse representation)
vector<vector<pair<int,int>>> T(512); // T[i] = list of (j, count)

void fill(int pos, int cur_mask, int next_mask, int start_mask, vector<vector<pair<int,int>>>& T_build) {
    while (pos < 9 && ((cur_mask >> pos) & 1)) pos++;
    if (pos == 9) {
        // Add transition start_mask -> next_mask
        for (auto& p : T_build[start_mask]) {
            if (p.first == next_mask) { p.second++; return; }
        }
        T_build[start_mask].push_back({next_mask, 1});
        return;
    }
    int r = pos / 3, c = pos % 3;
    if (c + 1 < 3 && !((cur_mask >> (pos+1)) & 1))
        fill(pos+1, cur_mask | (1<<pos) | (1<<(pos+1)), next_mask, start_mask, T_build);
    if (r + 1 < 3 && !((cur_mask >> (pos+3)) & 1))
        fill(pos+1, cur_mask | (1<<pos) | (1<<(pos+3)), next_mask, start_mask, T_build);
    fill(pos+1, cur_mask | (1<<pos), next_mask | (1<<pos), start_mask, T_build);
}

void build_transfer() {
    for (int s = 0; s < 512; s++) {
        fill(0, s, 0, s, T);
    }
}

// Berlekamp-Massey
pair<int, vector<ll>> berlekamp_massey(const vector<ll>& s) {
    int n = s.size();
    vector<ll> C = {1}, B = {1};
    int L = 0, m = 1;
    ll b = 1;

    auto modinv = [](ll a, ll mod) -> ll {
        ll g = mod, x = 0, y = 1;
        ll aa = a;
        while (aa != 0) {
            ll q = g / aa;
            g -= q * aa; swap(g, aa);
            x -= q * y; swap(x, y);
        }
        return (x % mod + mod) % mod;
    };

    for (int i = 0; i < n; i++) {
        ll d = s[i];
        for (int j = 1; j <= L; j++)
            d = (d + C[j] % MOD * (s[i-j] % MOD)) % MOD;
        d = (d % MOD + MOD) % MOD;

        if (d == 0) { m++; continue; }
        if (2*L <= i) {
            vector<ll> T_save = C;
            ll coef = d % MOD * modinv(b, MOD) % MOD;
            while ((int)C.size() < (int)B.size() + m) C.push_back(0);
            for (int j = 0; j < (int)B.size(); j++)
                C[j+m] = (C[j+m] - coef % MOD * (B[j] % MOD) % MOD + MOD) % MOD;
            L = i + 1 - L;
            B = T_save;
            b = d;
            m = 1;
        } else {
            ll coef = d % MOD * modinv(b, MOD) % MOD;
            while ((int)C.size() < (int)B.size() + m) C.push_back(0);
            for (int j = 0; j < (int)B.size(); j++)
                C[j+m] = (C[j+m] - coef % MOD * (B[j] % MOD) % MOD + MOD) % MOD;
            m++;
        }
    }
    return {L, C};
}

// Polynomial multiplication mod modpoly mod MOD
vector<ll> poly_mul(const vector<ll>& a, const vector<ll>& b, const vector<ll>& modpoly) {
    int d = modpoly.size() - 1;
    vector<ll> c(a.size() + b.size() - 1, 0);
    for (int i = 0; i < (int)a.size(); i++) {
        if (a[i] == 0) continue;
        for (int j = 0; j < (int)b.size(); j++)
            c[i+j] = (c[i+j] + a[i] * b[j]) % MOD;
    }

    auto modinv = [](ll a, ll mod) -> ll {
        ll g = mod, x = 0, y = 1;
        ll aa = a;
        while (aa != 0) {
            ll q = g / aa;
            g -= q * aa; swap(g, aa);
            x -= q * y; swap(x, y);
        }
        return (x % mod + mod) % mod;
    };

    ll lc_inv = modinv(modpoly[d], MOD);
    for (int i = (int)c.size() - 1; i >= d; i--) {
        if (c[i] == 0) continue;
        ll coef = c[i] % MOD * lc_inv % MOD;
        for (int j = 0; j <= d; j++)
            c[i-d+j] = (c[i-d+j] - coef * modpoly[j] % MOD + MOD) % MOD;
    }
    c.resize(d);
    return c;
}

int main() {
    build_transfer();

    // Compute sequence values
    vector<ll> v(512, 0);
    v[0] = 1;
    vector<ll> vals;
    vals.push_back(v[0]);

    for (int step = 0; step < 200; step++) {
        vector<ll> nv(512, 0);
        for (int i = 0; i < 512; i++) {
            if (v[i] == 0) continue;
            for (auto& [j, cnt] : T[i]) {
                nv[j] = (nv[j] + v[i] * cnt) % MOD;
            }
        }
        v = nv;
        vals.push_back(v[0]);
    }

    // Berlekamp-Massey
    auto [L, C] = berlekamp_massey(vals);

    // Build modpoly
    vector<ll> modpoly(L + 1);
    for (int j = 0; j <= L; j++)
        modpoly[j] = (C[L - j] % MOD + MOD) % MOD;

    // Compute 10^10000 in binary
    // We'll do polynomial exponentiation: start with x, square-and-multiply
    // But we need to represent 10^10000 in binary.
    // Strategy: compute by repeated squaring of the exponent itself? No.
    // Better: compute poly^10 repeatedly 10000 times (each time raise to 10th power).

    // poly_pow_10: raise polynomial to 10th power
    // poly^10 = ((poly^2)^2 * poly)^2  (10 = 1010 in binary)

    vector<ll> result(L, 0);
    result[0] = 1; // polynomial 1

    vector<ll> base(L, 0);
    base[1] = 1; // polynomial x

    // Compute base^(10^10000) = (...((base^10)^10)^10...)^10 (10000 times)
    // Start: cur = x
    // Repeat 10000 times: cur = cur^10

    auto poly_pow10 = [&](vector<ll> p) -> vector<ll> {
        // p^10 = p^8 * p^2
        auto p2 = poly_mul(p, p, modpoly);
        auto p4 = poly_mul(p2, p2, modpoly);
        auto p8 = poly_mul(p4, p4, modpoly);
        return poly_mul(p8, p2, modpoly);
    };

    vector<ll> cur = base; // x
    for (int i = 0; i < 10000; i++) {
        cur = poly_pow10(cur);
    }

    // Evaluate: f(n) = sum cur[i] * vals[i]
    ll answer = 0;
    for (int i = 0; i < L; i++) {
        answer = (answer + cur[i] % MOD * (vals[i] % MOD)) % MOD;
    }
    answer = (answer + MOD) % MOD;

    cout << answer << endl;
    return 0;
}