All Euler problems
Project Euler

Skew-cost Coding

Build a prefix-free binary code for N = 10^9 symbols. Each codeword is a binary string. A 0 bit costs 1 and a 1 bit costs 4. The cost of a codeword is the sum of costs of its bits. The total cost i...

Source sync Apr 19, 2026
Problem #0219
Level Level 11
Solved By 1,724
Languages C++, Python
Answer 64564225042
Length 587 words
optimizationgreedybrute_force

Problem Statement

This archive keeps the full statement, math, and original media on the page.

Let $A$ and $B$ be bit strings (sequences of 0's and 1's).

If $A$ is equal to the leftmost length($A$) bits of $B$, then $A$ is said to be a prefix of $B$.

For example, 00110 is a prefix of 001101001, but not of 00111 or 100110.

A prefix-free code of size n is a collection of n distinct bit strings such that no string is a prefix of any other. For example, this is a prefix-free code of size 6: $$0000, 0001, 001, 01, 10, 11$$ Now suppose that it costs one penny to transmit a '0' bit, but four pence to transmit a '1'.

Then the total cost of the prefix-free code shown above is 35 pence, which happens to be the cheapest possible for the skewed pricing scheme in question.

In short, we write $\operatorname{Cost}(6) = 35$.

What is $\operatorname{Cost}(109)$ ?

Problem 219: Skew-cost Coding

Mathematical Foundation

Definition. A prefix-free code for NN symbols is a set of NN binary strings such that no string is a prefix of another. Such a code corresponds to a binary tree with NN leaves.

Theorem 1 (Greedy optimality). The minimum total cost is achieved by the greedy algorithm: starting from a single leaf (cost 0), repeatedly split the minimum-cost leaf. Splitting a leaf of cost cc removes it and creates two leaves of costs c+1c + 1 (append 0) and c+4c + 4 (append 1).

Proof. We prove that at optimality, the two leaves created by the last split should come from the minimum-cost leaf. Suppose for contradiction that an optimal tree has a leaf 1\ell_1 of cost c1c_1 that is a leaf (never split) and a leaf 2\ell_2 of cost c2>c1c_2 > c_1 that was split from some internal node. Consider the tree obtained by swapping: make 2\ell_2 a leaf and split 1\ell_1 instead. The cost change is (c1+1)+(c1+4)c1[(c2+1)+(c2+4)c2]=c1c2<0(c_1 + 1) + (c_1 + 4) - c_1 - [(c_2 + 1) + (c_2 + 4) - c_2] = c_1 - c_2 < 0, a strict improvement. This contradicts optimality unless we always split the minimum-cost leaf. By induction on the number of splits, the greedy strategy is optimal. \square

Lemma 1 (Cost accounting). Each split of a leaf with cost cc increases the total cost by c+5c + 5 and the number of leaves by 1.

Proof. Splitting removes one leaf of cost cc and adds two leaves of costs c+1c + 1 and c+4c + 4. Net change in total cost: (c+1)+(c+4)c=c+5(c+1) + (c+4) - c = c + 5. Net change in leaf count: 21=12 - 1 = 1. \square

Theorem 2 (Total cost formula). Starting with one leaf of cost 0, after N1N - 1 splits the total cost is

Total=5(N1)+i=0N2ci\text{Total} = 5(N-1) + \sum_{i=0}^{N-2} c_i

where cic_i is the cost of the leaf split at step ii.

Proof. The initial total cost is 0. By Lemma 1, each split ii adds ci+5c_i + 5. Summing: Total=i=0N2(ci+5)=5(N1)+ci\text{Total} = \sum_{i=0}^{N-2}(c_i + 5) = 5(N-1) + \sum c_i. \square

Theorem 3 (Bulk simulation correctness). At any point, if all kk leaves of the current minimum cost cc are split simultaneously, this is equivalent to kk sequential greedy splits (each splitting the minimum-cost leaf). The total cost increases by k(c+5)k(c + 5), the leaf count increases by kk, and the cost histogram updates as: remove kk entries at cost cc; add kk entries at cost c+1c + 1; add kk entries at cost c+4c + 4.

Proof. Since all kk leaves have the same minimum cost cc, they would be the first kk leaves chosen by the sequential greedy algorithm (ties broken arbitrarily). Splitting them in any order produces the same result: each split removes one leaf of cost cc and adds leaves of costs c+1c+1 and c+4c+4. The kk splits are independent (they operate on distinct leaves), so the bulk operation is correct. \square

Editorial

Build a prefix-free code for N = 10^9 symbols. Bit ‘0’ costs 1, bit ‘1’ costs 4. Minimize total cost of all codewords. Greedy approach: always split the cheapest codeword. Splitting a codeword of cost c removes it and creates c+1 and c+4. Net cost increase per split of cost c: c + 5. Use histogram (sorted dict) for bulk processing. We histogram: cost -> count. We then don’t overshoot: we need N - leaves more splits. Finally, bulk split k leaves of cost c.

Pseudocode

Histogram: cost -> count
Don't overshoot: we need N - leaves more splits
Bulk split k leaves of cost c
Update histogram

Complexity Analysis

  • Time: Each iteration processes all leaves of the current minimum cost, advancing the minimum cost by at least 1. The maximum cost reached is O(N)O(\sqrt{N}) (since the number of leaves grows with each bulk split, and the cost advances linearly while the leaf count grows geometrically). Hence O(N)O(\sqrt{N}) iterations, each O(log(histogram size))O(\log(\text{histogram size})). Total: O(NlogN)O(\sqrt{N} \log N).
  • Space: O(N)O(\sqrt{N}) for the histogram (at most O(N)O(\sqrt{N}) distinct cost levels active at once).

Answer

64564225042\boxed{64564225042}

Code

Each problem page includes the exact C++ and Python source files from the local archive.

C++ project_euler/problem_219/solution.cpp
#include <bits/stdc++.h>
using namespace std;

/*
 * Problem 219: Skew-cost Coding
 *
 * Build a prefix-free code for N = 10^9 symbols.
 * Bit '0' costs 1, bit '1' costs 4.
 * Minimize total cost of all codewords.
 *
 * Greedy: always split the cheapest codeword.
 * Splitting cost c -> produces c+1 and c+4.
 * Use histogram for bulk processing.
 *
 * Answer: 64564225042
 */

int main() {
    const long long N = 1000000000LL;

    // Histogram: cost -> count
    // Use a sorted map for efficient min access
    map<long long, long long> freq;
    freq[0] = 1;
    long long total_cost = 0;
    long long num_codes = 1;

    while (num_codes < N) {
        auto it = freq.begin();
        long long cost = it->first;
        long long count = it->second;
        freq.erase(it);

        // How many can we split? Each split adds 1 codeword.
        long long need = N - num_codes;
        long long splits = min(count, need);

        // Split 'splits' codewords of this cost
        total_cost += splits * (cost + 5);
        num_codes += splits;

        // Add children
        freq[cost + 1] += splits;
        freq[cost + 4] += splits;

        // If we didn't split all, put the rest back
        if (splits < count) {
            freq[cost] += (count - splits);
        }
    }

    cout << total_cost << endl;
    return 0;
}