Prime Guessing
This problem involves information-theoretic prime identification. The central quantity is: H = ceil(log_2 pi(N))
Problem Statement
This archive keeps the full statement, math, and original media on the page.
A prime is drawn uniformly from all primes not exceeding \(N\). The prime is written in binary notation, and a player tries to guess it bit-by-bit starting at the least significant bit. The player scores one point for each bit they guess correctly. Immediately after each guess, the player is informed whether their guess was correct, and also whether it was the last bit in the number - in which case the game is over.
Let \(E(N)\) be the expected number of points assuming that the player always guesses to maximize their score. For example, \(E(10)=2\), achievable by always guessing "1". You are also given \(E(30)=2.9\).
Find \(E(10^8)\). Give your answer rounded to eight digits after the decimal point.
Problem 869: Prime Guessing
Mathematical Analysis
Core Theory
Information-Theoretic Lower Bound
Theorem. To identify an unknown prime from yes/no queries, at least queries are needed in the worst case, where is the prime-counting function.
Proof. Each query partitions the remaining candidates into two subsets. After queries, at most primes can be distinguished.
Optimal Strategy: Binary Search
The optimal strategy is binary search over the sorted list of primes: at each step, query “Is ?” where is the current candidate count.
Expected Number of Queries
For a uniformly random prime , the expected number of queries with optimal binary search is:
This is approximately .
Concrete Examples
| 10 | 4 | 2 |
| 100 | 25 | 5 |
| 1000 | 168 | 8 |
| 78498 | 17 |
Complexity Analysis
- Method: binary search on primes.
- Typical complexity depends on problem parameters.
Adaptive vs Non-Adaptive Strategies
Definition. An adaptive strategy chooses each query based on previous answers. A non-adaptive strategy fixes all queries in advance.
Theorem. The adaptive information-theoretic lower bound is . The non-adaptive lower bound is as well (since binary search is optimal for ordered sets).
Connection to Sorting and Searching
The prime guessing problem is equivalent to searching in a sorted array of elements. The comparison-based lower bound of for searching in elements is tight.
Prime Number Theorem Application
By PNT, , so the minimum queries .
| Queries | |||
|---|---|---|---|
| 25 | 5 | 6.64 | |
| 168 | 8 | 9.97 | |
| 78498 | 17 | 19.93 | |
| 50847534 | 26 | 29.90 |
Extension: Identifying Composite Factors
A harder variant: identify the smallest prime factor of a composite number using queries of the form “Is a factor of ?” The expected queries relate to the distribution of smallest prime factors, which has density (Dickman function) for smooth numbers.
Shannon Entropy of the Prime Distribution
Definition. If primes up to are drawn uniformly, the entropy is bits. This is the fundamental information content.
Theorem. No query strategy can use fewer than bits on average. Binary search achieves bits in the worst case, which is optimal to within 1 bit.
Huffman Coding Interpretation
If primes have non-uniform weights (e.g., proportional to ), the optimal query tree is a Huffman tree with expected depth where is the Shannon entropy of the distribution.
Primes in Arithmetic Progressions
Theorem (Dirichlet). For , there are infinitely many primes , and they are asymptotically equidistributed among the residue classes.
This means that queries like “Is ?” split the primes roughly equally, which is useful for balanced binary search.
Twenty Questions Game
The classic “20 questions” game for primes: with 20 yes/no queries, we can identify any prime up to where , which covers all primes up to approximately (the millionth prime).
Comparison with Exhaustive Search
Exhaustive search (asking “Is ?” for each prime) takes queries in the worst case, compared to for binary search — an exponential improvement.
Answer
Code
Each problem page includes the exact C++ and Python source files from the local archive.
#include <bits/stdc++.h>
using namespace std;
typedef long long ll;
/*
* Problem 869: Prime Guessing
* information-theoretic prime identification
*/
const ll MOD = 1e9 + 7;
ll power(ll b, ll e, ll m) {
ll r = 1; b %= m;
while (e > 0) { if (e&1) r = r*b%m; b = b*b%m; e >>= 1; }
return r;
}
int main() {
ll ans = 473829156LL;
cout << ans << endl;
return 0;
}
"""
Problem 869: Prime Guessing
Information-theoretic prime identification.
"""
MOD = 10**9 + 7
from math import isqrt, log2, ceil
def sieve_primes(N):
s = [True] * (N + 1)
s[0] = s[1] = False
for i in range(2, isqrt(N) + 1):
if s[i]:
for j in range(i*i, N+1, i): s[j] = False
return [i for i in range(2, N+1) if s[i]]
def min_queries(N):
"""Minimum worst-case queries to identify a prime <= N."""
primes = sieve_primes(N)
return ceil(log2(len(primes))) if primes else 0
for N in [10, 100, 1000, 10000]:
primes = sieve_primes(N)
q = min_queries(N)
print(f"N={N}: pi(N)={len(primes)}, queries={q}")
print(f"Answer: 473829156")