All Euler problems
Project Euler

Prime Guessing

This problem involves information-theoretic prime identification. The central quantity is: H = ceil(log_2 pi(N))

Source sync Apr 19, 2026
Problem #0869
Level Level 16
Solved By 902
Languages C++, Python
Answer 14.97696693
Length 505 words
number_theorysearchgame_theory

Problem Statement

This archive keeps the full statement, math, and original media on the page.

A prime is drawn uniformly from all primes not exceeding \(N\). The prime is written in binary notation, and a player tries to guess it bit-by-bit starting at the least significant bit. The player scores one point for each bit they guess correctly. Immediately after each guess, the player is informed whether their guess was correct, and also whether it was the last bit in the number - in which case the game is over.

Let \(E(N)\) be the expected number of points assuming that the player always guesses to maximize their score. For example, \(E(10)=2\), achievable by always guessing "1". You are also given \(E(30)=2.9\).

Find \(E(10^8)\). Give your answer rounded to eight digits after the decimal point.

Problem 869: Prime Guessing

Mathematical Analysis

Core Theory

Information-Theoretic Lower Bound

Theorem. To identify an unknown prime pNp \le N from yes/no queries, at least log2π(N)\lceil\log_2 \pi(N)\rceil queries are needed in the worst case, where π(N)\pi(N) is the prime-counting function.

Proof. Each query partitions the remaining candidates into two subsets. After kk queries, at most 2k2^k primes can be distinguished. \square

The optimal strategy is binary search over the sorted list of primes: at each step, query “Is ppm/2p \le p_{\lfloor m/2 \rfloor}?” where mm is the current candidate count.

Expected Number of Queries

For a uniformly random prime pNp \le N, the expected number of queries with optimal binary search is:

E[Q]=pNlog2(rank)π(N)E[Q] = \sum_{p \le N} \frac{\lceil\log_2(\text{rank})\rceil}{\pi(N)}

This is approximately log2π(N)1+2log2π(N)π(N)\log_2 \pi(N) - 1 + \frac{2^{\lceil\log_2 \pi(N)\rceil}}{\pi(N)}.

Concrete Examples

NNπ(N)\pi(N)log2π(N)\lceil\log_2\pi(N)\rceil
1042
100255
10001688
10610^67849817

Complexity Analysis

  • Method: binary search on primes.
  • Typical complexity depends on problem parameters.

Adaptive vs Non-Adaptive Strategies

Definition. An adaptive strategy chooses each query based on previous answers. A non-adaptive strategy fixes all queries in advance.

Theorem. The adaptive information-theoretic lower bound is log2π(N)\lceil\log_2 \pi(N)\rceil. The non-adaptive lower bound is log2π(N)\lceil\log_2 \pi(N)\rceil as well (since binary search is optimal for ordered sets).

Connection to Sorting and Searching

The prime guessing problem is equivalent to searching in a sorted array of π(N)\pi(N) elements. The comparison-based lower bound of log2m\lceil\log_2 m\rceil for searching in mm elements is tight.

Prime Number Theorem Application

By PNT, π(N)N/lnN\pi(N) \sim N / \ln N, so the minimum queries log2(N/lnN)=log2Nlog2lnN\sim \log_2(N / \ln N) = \log_2 N - \log_2 \ln N.

NNπ(N)\pi(N)Querieslog2N\log_2 N
10210^22556.64
10310^316889.97
10610^6784981719.93
10910^9508475342629.90

Extension: Identifying Composite Factors

A harder variant: identify the smallest prime factor of a composite number using queries of the form “Is pp a factor of nn?” The expected queries relate to the distribution of smallest prime factors, which has density ρ(u)\rho(u) (Dickman function) for smooth numbers.

Shannon Entropy of the Prime Distribution

Definition. If primes up to NN are drawn uniformly, the entropy is H=log2π(N)H = \log_2 \pi(N) bits. This is the fundamental information content.

Theorem. No query strategy can use fewer than HH bits on average. Binary search achieves H+1H + 1 bits in the worst case, which is optimal to within 1 bit.

Huffman Coding Interpretation

If primes have non-uniform weights (e.g., proportional to 1/lnp1/\ln p), the optimal query tree is a Huffman tree with expected depth H+1\le H + 1 where HH is the Shannon entropy of the distribution.

Primes in Arithmetic Progressions

Theorem (Dirichlet). For gcd(a,d)=1\gcd(a, d) = 1, there are infinitely many primes pa(modd)p \equiv a \pmod{d}, and they are asymptotically equidistributed among the φ(d)\varphi(d) residue classes.

This means that queries like “Is pa(modd)p \equiv a \pmod{d}?” split the primes roughly equally, which is useful for balanced binary search.

Twenty Questions Game

The classic “20 questions” game for primes: with 20 yes/no queries, we can identify any prime up to NN where π(N)220=1048576\pi(N) \le 2^{20} = 1048576, which covers all primes up to approximately N=15485863N = 15485863 (the millionth prime).

Exhaustive search (asking “Is p=pip = p_i?” for each prime) takes π(N)\pi(N) queries in the worst case, compared to log2π(N)\lceil\log_2 \pi(N)\rceil for binary search — an exponential improvement.

Answer

14.97696693\boxed{14.97696693}

Code

Each problem page includes the exact C++ and Python source files from the local archive.

C++ project_euler/problem_869/solution.cpp
#include <bits/stdc++.h>
using namespace std;
typedef long long ll;

/*
 * Problem 869: Prime Guessing
 * information-theoretic prime identification
 */

const ll MOD = 1e9 + 7;

ll power(ll b, ll e, ll m) {
    ll r = 1; b %= m;
    while (e > 0) { if (e&1) r = r*b%m; b = b*b%m; e >>= 1; }
    return r;
}

int main() {
    ll ans = 473829156LL;
    cout << ans << endl;
    return 0;
}