Project Euler

Building a Tower

Let f(n) denote the number of ways to fill a 3 x 3 x n tower using 2 x 1 x 1 bricks. Find f(10^10000) mod (10^8 + 7).

Source sync Apr 19, 2026

Problem #0324

Level Level 17

Solved By 845

Languages C++, Python

Answer 96972774

Length 368 words

modular_arithmeticlinear_algebraalgebra

Let $f(n)$ represent the number of ways one can fill a $3 \times 3 \times n$ tower with blocks of $2 \times 1 \times 1$.

You’re allowed to rotate the blocks in any way you like; however, rotations, reflections etc of the tower itself are counted as distinct.

For example (with $q = 100000007$):

$f(2) = 229$,

$f(4) = 117805$,

$f(10) \bmod q = 96149360$,

$f(10^3) \bmod q = 24806056$,

$f(10^6) \bmod q = 30808124$.

Find $f(10^{10000}) \bmod 100000007$.

Problem 324: Building a Tower

Mathematical Foundation

Theorem 1 (Transfer matrix). Let $\mathcal{S} = \{0,1\}^9$ be the set of all bitmasks of the $3 \times 3$ cross-section. There exists a $512 \times 512$ matrix $T$ over $\mathbb{Z}$ such that

f(n) = (T^n)_{0,0}

where state $0$ represents the empty profile (no protruding bricks).

Proof. We build the tower layer by layer. A profile $s \in \mathcal{S}$ records which of the 9 cells at the boundary between two consecutive layers are occupied by bricks protruding from below. Entry $T[s_1][s_2]$ counts the number of valid placements of $2 \times 1 \times 1$ bricks within a single layer such that all empty cells of profile $s_1$ are filled, and the cells protruding upward form profile $s_2$ . The brick placements within a layer correspond to horizontal dominoes (within-row or within-column) and vertical bricks crossing the boundary. Since the starting and ending profiles are both empty (no protrusions at the bottom or top), $f(n) = (T^n)_{0,0}$ . $\square$

Lemma 1 (Linear recurrence). The sequence $f(n)$ satisfies a linear recurrence

f(n) = c_1 f(n-1) + c_2 f(n-2) + \cdots + c_d f(n-d)

of degree $d$ equal to the degree of the minimal polynomial of $T$ restricted to the cyclic subspace generated by $e_0$ (the unit vector for state 0).

Proof. The Cayley—Hamilton theorem guarantees that $T$ satisfies its characteristic polynomial of degree at most 512. The sequence $(T^n)_{0,0} = e_0^T T^n e_0$ therefore satisfies the corresponding recurrence. The minimal polynomial of the sequence has degree at most $d \leq 512$ . $\square$

Theorem 2 (Polynomial exponentiation for large $n$ ). Let $m(x)$ be the minimal polynomial of the sequence $\{f(n)\}$ of degree $d$ , working modulo the prime $p = 10^8 + 7$ . Then

f(n) \equiv \sum_{i=0}^{d-1} a_i \, f(i) \pmod{p}

where $x^n \equiv \sum_{i=0}^{d-1} a_i x^i \pmod{m(x)}$ in $\mathbb{F}_p[x]$ , computed via repeated squaring.

Proof. Since $f(n) = e_0^T T^n e_0$ and $m(T)$ annihilates the relevant subspace, any polynomial identity $x^n \equiv r(x) \pmod{m(x)}$ translates directly to $T^n \equiv r(T)$ on that subspace, hence $f(n) = \sum a_i f(i)$ . The polynomial $x^n \bmod m(x)$ is computed via $O(\log n)$ squarings in $\mathbb{F}_p[x] / (m(x))$ . $\square$

Lemma 2 (Berlekamp—Massey). Given $f(0), f(1), \ldots, f(2d-1)$ , the Berlekamp—Massey algorithm recovers the minimal polynomial $m(x)$ of degree $d$ in $O(d^2)$ operations over $\mathbb{F}_p$ .

Proof. Standard result; see Massey (1969). $\square$

Editorial

3x3xn tower with 2x1x1 bricks. Find f(10^10000) mod (10^8 + 7). Uses transfer matrix + Berlekamp-Massey + polynomial exponentiation. We build transfer matrix T (512 x 512). We then compute f(0), …, f(2*512) by repeated matrix-vector multiplication. Finally, find minimal polynomial via Berlekamp-Massey.

Pseudocode

Build transfer matrix T (512 x 512)
Compute f(0), ..., f(2*512) by repeated matrix-vector multiplication
Find minimal polynomial via Berlekamp-Massey
Compute x^(10^10000) mod m(x) in F_p[x]
Represent n = 10^10000 in binary (~33220 bits)
Use repeated squaring of polynomials mod m(x)
Evaluate f(n) = sum of a_i * f(i)

Complexity Analysis

Time: $O(2^9 \cdot 2^9 \cdot E)$ for transfer matrix construction (where $E$ is the cost of enumerating tilings per state pair) + $O(d^2)$ for Berlekamp—Massey + $O(d^2 \log n)$ for polynomial exponentiation, where $d \leq 512$ and $\log_2(10^{10000}) \approx 33220$ . Dominant term: $O(d^2 \log n)$ .
Space: $O(d^2)$ for the transfer matrix or $O(d)$ for the polynomial representation.

Answer

\boxed{96972774}

C++ project_euler/problem_324/solution.cpp

#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const ll MOD = 100000007LL;

// Transfer matrix (sparse representation)
vector<vector<pair<int,int>>> T(512); // T[i] = list of (j, count)

void fill(int pos, int cur_mask, int next_mask, int start_mask, vector<vector<pair<int,int>>>& T_build) {
    while (pos < 9 && ((cur_mask >> pos) & 1)) pos++;
    if (pos == 9) {
        // Add transition start_mask -> next_mask
        for (auto& p : T_build[start_mask]) {
            if (p.first == next_mask) { p.second++; return; }
        }
        T_build[start_mask].push_back({next_mask, 1});
        return;
    }
    int r = pos / 3, c = pos % 3;
    if (c + 1 < 3 && !((cur_mask >> (pos+1)) & 1))
        fill(pos+1, cur_mask | (1<<pos) | (1<<(pos+1)), next_mask, start_mask, T_build);
    if (r + 1 < 3 && !((cur_mask >> (pos+3)) & 1))
        fill(pos+1, cur_mask | (1<<pos) | (1<<(pos+3)), next_mask, start_mask, T_build);
    fill(pos+1, cur_mask | (1<<pos), next_mask | (1<<pos), start_mask, T_build);
}

void build_transfer() {
    for (int s = 0; s < 512; s++) {
        fill(0, s, 0, s, T);
    }
}

// Berlekamp-Massey
pair<int, vector<ll>> berlekamp_massey(const vector<ll>& s) {
    int n = s.size();
    vector<ll> C = {1}, B = {1};
    int L = 0, m = 1;
    ll b = 1;

    auto modinv = [](ll a, ll mod) -> ll {
        ll g = mod, x = 0, y = 1;
        ll aa = a;
        while (aa != 0) {
            ll q = g / aa;
            g -= q * aa; swap(g, aa);
            x -= q * y; swap(x, y);
        }
        return (x % mod + mod) % mod;
    };

    for (int i = 0; i < n; i++) {
        ll d = s[i];
        for (int j = 1; j <= L; j++)
            d = (d + C[j] % MOD * (s[i-j] % MOD)) % MOD;
        d = (d % MOD + MOD) % MOD;

        if (d == 0) { m++; continue; }
        if (2*L <= i) {
            vector<ll> T_save = C;
            ll coef = d % MOD * modinv(b, MOD) % MOD;
            while ((int)C.size() < (int)B.size() + m) C.push_back(0);
            for (int j = 0; j < (int)B.size(); j++)
                C[j+m] = (C[j+m] - coef % MOD * (B[j] % MOD) % MOD + MOD) % MOD;
            L = i + 1 - L;
            B = T_save;
            b = d;
            m = 1;
        } else {
            ll coef = d % MOD * modinv(b, MOD) % MOD;
            while ((int)C.size() < (int)B.size() + m) C.push_back(0);
            for (int j = 0; j < (int)B.size(); j++)
                C[j+m] = (C[j+m] - coef % MOD * (B[j] % MOD) % MOD + MOD) % MOD;
            m++;
        }
    }
    return {L, C};
}

// Polynomial multiplication mod modpoly mod MOD
vector<ll> poly_mul(const vector<ll>& a, const vector<ll>& b, const vector<ll>& modpoly) {
    int d = modpoly.size() - 1;
    vector<ll> c(a.size() + b.size() - 1, 0);
    for (int i = 0; i < (int)a.size(); i++) {
        if (a[i] == 0) continue;
        for (int j = 0; j < (int)b.size(); j++)
            c[i+j] = (c[i+j] + a[i] * b[j]) % MOD;
    }

    auto modinv = [](ll a, ll mod) -> ll {
        ll g = mod, x = 0, y = 1;
        ll aa = a;
        while (aa != 0) {
            ll q = g / aa;
            g -= q * aa; swap(g, aa);
            x -= q * y; swap(x, y);
        }
        return (x % mod + mod) % mod;
    };

    ll lc_inv = modinv(modpoly[d], MOD);
    for (int i = (int)c.size() - 1; i >= d; i--) {
        if (c[i] == 0) continue;
        ll coef = c[i] % MOD * lc_inv % MOD;
        for (int j = 0; j <= d; j++)
            c[i-d+j] = (c[i-d+j] - coef * modpoly[j] % MOD + MOD) % MOD;
    }
    c.resize(d);
    return c;
}

int main() {
    build_transfer();

    // Compute sequence values
    vector<ll> v(512, 0);
    v[0] = 1;
    vector<ll> vals;
    vals.push_back(v[0]);

    for (int step = 0; step < 200; step++) {
        vector<ll> nv(512, 0);
        for (int i = 0; i < 512; i++) {
            if (v[i] == 0) continue;
            for (auto& [j, cnt] : T[i]) {
                nv[j] = (nv[j] + v[i] * cnt) % MOD;
            }
        }
        v = nv;
        vals.push_back(v[0]);
    }

    // Berlekamp-Massey
    auto [L, C] = berlekamp_massey(vals);

    // Build modpoly
    vector<ll> modpoly(L + 1);
    for (int j = 0; j <= L; j++)
        modpoly[j] = (C[L - j] % MOD + MOD) % MOD;

    // Compute 10^10000 in binary
    // We'll do polynomial exponentiation: start with x, square-and-multiply
    // But we need to represent 10^10000 in binary.
    // Strategy: compute by repeated squaring of the exponent itself? No.
    // Better: compute poly^10 repeatedly 10000 times (each time raise to 10th power).

    // poly_pow_10: raise polynomial to 10th power
    // poly^10 = ((poly^2)^2 * poly)^2  (10 = 1010 in binary)

    vector<ll> result(L, 0);
    result[0] = 1; // polynomial 1

    vector<ll> base(L, 0);
    base[1] = 1; // polynomial x

    // Compute base^(10^10000) = (...((base^10)^10)^10...)^10 (10000 times)
    // Start: cur = x
    // Repeat 10000 times: cur = cur^10

    auto poly_pow10 = [&](vector<ll> p) -> vector<ll> {
        // p^10 = p^8 * p^2
        auto p2 = poly_mul(p, p, modpoly);
        auto p4 = poly_mul(p2, p2, modpoly);
        auto p8 = poly_mul(p4, p4, modpoly);
        return poly_mul(p8, p2, modpoly);
    };

    vector<ll> cur = base; // x
    for (int i = 0; i < 10000; i++) {
        cur = poly_pow10(cur);
    }

    // Evaluate: f(n) = sum cur[i] * vals[i]
    ll answer = 0;
    for (int i = 0; i < L; i++) {
        answer = (answer + cur[i] % MOD * (vals[i] % MOD)) % MOD;
    }
    answer = (answer + MOD) % MOD;

    cout << answer << endl;
    return 0;
}

Python project_euler/problem_324/solution.py

"""
Problem 324: Building a Tower
3x3xn tower with 2x1x1 bricks.
Find f(10^10000) mod (10^8 + 7).

Uses transfer matrix + Berlekamp-Massey + polynomial exponentiation.
"""

MOD = 10**8 + 7

def build_transfer_matrix():
    """Build 512x512 transfer matrix for 3x3 layer profiles."""
    T = [[0]*512 for _ in range(512)]

    def fill(pos, cur_mask, next_mask, start_mask):
        while pos < 9 and (cur_mask >> pos) & 1:
            pos += 1
        if pos == 9:
            T[start_mask][next_mask] += 1
            return
        r, c = pos // 3, pos % 3
        # Horizontal brick (right)
        if c + 1 < 3 and not ((cur_mask >> (pos + 1)) & 1):
            fill(pos + 1, cur_mask | (1 << pos) | (1 << (pos + 1)), next_mask, start_mask)
        # Vertical brick (down, within layer)
        if r + 1 < 3 and not ((cur_mask >> (pos + 3)) & 1):
            fill(pos + 1, cur_mask | (1 << pos) | (1 << (pos + 3)), next_mask, start_mask)
        # Brick going up to next layer
        fill(pos + 1, cur_mask | (1 << pos), next_mask | (1 << pos), start_mask)

    for s in range(512):
        fill(0, s, 0, s)
    return T

def berlekamp_massey(s, mod):
    """Find shortest LFSR for sequence s mod mod."""
    n = len(s)
    C, B = [1], [1]
    L, m, b = 0, 1, 1

    for i in range(n):
        d = s[i]
        for j in range(1, L + 1):
            d = (d + C[j] * s[i - j]) % mod
        if d == 0:
            m += 1
        elif 2 * L <= i:
            T_save = C[:]
            coef = d * pow(b, -1, mod) % mod
            while len(C) < len(B) + m:
                C.append(0)
            for j in range(len(B)):
                C[j + m] = (C[j + m] - coef * B[j]) % mod
            L = i + 1 - L
            B = T_save
            b = d
            m = 1
        else:
            coef = d * pow(b, -1, mod) % mod
            while len(C) < len(B) + m:
                C.append(0)
            for j in range(len(B)):
                C[j + m] = (C[j + m] - coef * B[j]) % mod
            m += 1
    return L, C

def poly_mul(a, b, modpoly, mod):
    """Multiply polynomials a*b mod modpoly mod mod."""
    c = [0] * (len(a) + len(b) - 1)
    for i in range(len(a)):
        if a[i] == 0:
            continue
        for j in range(len(b)):
            c[i + j] = (c[i + j] + a[i] * b[j]) % mod

    d = len(modpoly) - 1
    lc_inv = pow(modpoly[d], -1, mod)
    for i in range(len(c) - 1, d - 1, -1):
        if c[i] == 0:
            continue
        coef = c[i] * lc_inv % mod
        for j in range(d + 1):
            c[i - d + j] = (c[i - d + j] - coef * modpoly[j]) % mod

    result = c[:d]
    while len(result) < d:
        result.append(0)
    return result

def poly_pow(base, exp, modpoly, mod):
    """Compute base^exp mod modpoly mod mod."""
    d = len(modpoly) - 1
    result = [0] * d
    result[0] = 1

    if exp == 0:
        return result

    bits = []
    n = exp
    while n > 0:
        bits.append(n & 1)
        n >>= 1
    bits.reverse()

    for bit in bits:
        result = poly_mul(result, result, modpoly, mod)
        if bit:
            result = poly_mul(result, base, modpoly, mod)
    return result

def solve():
    T = build_transfer_matrix()

    # Compute sequence values via matrix-vector multiplication
    v = [0] * 512
    v[0] = 1
    vals = [v[0]]

    for step in range(200):
        nv = [0] * 512
        for i in range(512):
            if v[i] == 0:
                continue
            for j in range(512):
                if T[i][j] == 0:
                    continue
                nv[j] = (nv[j] + v[i] * T[i][j]) % MOD
        v = nv
        vals.append(v[0])

    # Find minimal polynomial
    L, C = berlekamp_massey(vals, MOD)

    # Build modular polynomial (index = degree)
    modpoly = [C[L - j] % MOD for j in range(L + 1)]

    # Compute x^(10^10000) mod modpoly
    n = pow(10, 10000)
    base = [0] * L
    base[1] = 1  # polynomial x

    result_poly = poly_pow(base, n, modpoly, MOD)

    # Evaluate: f(n) = sum result_poly[i] * vals[i]
    answer = 0
    for i in range(L):
        answer = (answer + result_poly[i] * vals[i]) % MOD

    print(answer)

if __name__ == "__main__":
    solve()