All Euler problems
Project Euler

Prime Gap Distribution

Let g(p) denote the gap to the next prime: g(p) = p' - p, where p' is the smallest prime exceeding p. Find sum_(p <= 10^7, p prime) g(p)^2.

Source sync Apr 19, 2026
Problem #0929
Level Level 32
Solved By 253
Languages C++, Python
Answer 57322484
Length 323 words
number_theorybrute_forcegeometry

Problem Statement

This archive keeps the full statement, math, and original media on the page.

A composition of is a sequence of positive integers which sum to . Such a sequence can be split into runs, where a run is a maximal contiguous subsequence of equal terms.

For example, is a composition of consisting of four runs:

Let be the number of compositions of where every run has odd length.

For example, :

Find . Give your answer modulo .

Problem 929: Prime Gap Distribution

Mathematical Foundation

Theorem 1 (Prime Number Theorem). The number of primes up to xx satisfies π(x)xlnx\pi(x) \sim \frac{x}{\ln x} as xx \to \infty. Equivalently, the nn-th prime satisfies pnnlnnp_n \sim n \ln n.

Proof. (Sketch.) The PNT was proved independently by Hadamard and de la Vallee Poussin in 1896 using the non-vanishing of ζ(1+it)\zeta(1 + it) and contour integration applied to Λ(n)/ns=ζ(s)/ζ(s)\sum \Lambda(n)/n^s = -\zeta'(s)/\zeta(s). \square

Theorem 2 (Average Gap). The average prime gap near xx is lnx\ln x. More precisely, for pnN<pn+1p_n \leq N < p_{n+1}:

1nk=1ngk=pn+12nlnpn.\frac{1}{n}\sum_{k=1}^{n} g_k = \frac{p_{n+1} - 2}{n} \sim \ln p_n.

Proof. The gaps telescope: k=1ngk=k=1n(pk+1pk)=pn+1p1=pn+12\sum_{k=1}^{n} g_k = \sum_{k=1}^{n}(p_{k+1} - p_k) = p_{n+1} - p_1 = p_{n+1} - 2. Dividing by n=π(pn)n = \pi(p_n) and using PNT: pn+12π(pn)pnpn/lnpn=lnpn\frac{p_{n+1} - 2}{\pi(p_n)} \sim \frac{p_n}{p_n/\ln p_n} = \ln p_n. \square

Theorem 3 (Sum of Squared Gaps — Heuristic). Under the Hardy—Littlewood prime kk-tuples conjecture, the sum of squared gaps satisfies:

pNp primeg(p)22NlnNas N.\sum_{\substack{p \leq N \\ p \text{ prime}}} g(p)^2 \sim 2N \ln N \quad \text{as } N \to \infty.

Proof. (Heuristic.) If gaps gg near xx follow an approximate exponential distribution with mean lnx\ln x, then E[g2]2(lnx)2\mathbb{E}[g^2] \sim 2(\ln x)^2 and the sum over π(N)N/lnN\pi(N) \sim N/\ln N primes gives (N/lnN)2(lnN)2=2NlnN\sim (N/\ln N) \cdot 2(\ln N)^2 = 2N \ln N. A rigorous proof would require the full strength of the Hardy—Littlewood conjectures. \square

Lemma 1 (Sieve Correctness). The sieve of Eratosthenes correctly identifies all primes up to NN in O(NloglogN)O(N \log \log N) time.

Proof. A composite nNn \leq N has a prime factor pNp \leq \sqrt{N}. The sieve marks nn as composite when processing pp. Conversely, a prime nn is never marked (it has no proper prime factor n\leq \sqrt{n}). The time bound follows from pNN/p=O(NloglogN)\sum_{p \leq N} N/p = O(N \log \log N) (Mertens’ theorem). \square

Lemma 2 (Boundary Handling). To compute g(p)g(p) for all primes pNp \leq N, it suffices to sieve up to N+O(N0.525)N + O(N^{0.525}).

Proof. By the Baker—Harman—Pintz theorem (2001), the gap g(p)=O(p0.525)g(p) = O(p^{0.525}) for all primes pp. Hence for pNp \leq N, the next prime pp+O(p0.525)N+O(N0.525)p' \leq p + O(p^{0.525}) \leq N + O(N^{0.525}). In practice, gaps are much smaller (the maximal gap below 10710^7 is 154154), so sieving to N+1000N + 1000 is more than sufficient. \square

Editorial

Compute the sum of squared prime gaps: S(N) = sum_{p <= N, p prime} (p’ - p)^2, where p’ is the next prime after p, for N = 10^7. We sieve primes up to N + buffer. We then collect primes up to N, plus the next prime after N. Finally, accumulate sum of squared gaps.

Pseudocode

Sieve primes up to N + buffer
Collect primes up to N, plus the next prime after N
Accumulate sum of squared gaps

Complexity Analysis

  • Time: O(NloglogN)O(N \log \log N) for the sieve. The gap accumulation is O(π(N))=O(N/lnN)O(\pi(N)) = O(N / \ln N), dominated by the sieve.
  • Space: O(N)O(N) for the sieve bit array. The prime list requires O(π(N))=O(N/lnN)O(\pi(N)) = O(N / \ln N) additional space.

Answer

57322484\boxed{57322484}

Code

Each problem page includes the exact C++ and Python source files from the local archive.

C++ project_euler/problem_929/solution.cpp
#include <bits/stdc++.h>
using namespace std;

/*
 * Problem 929: Prime Gap Distribution
 *
 * Find sum_{p <= 10^7} g(p)^2 where g(p) = next_prime - p.
 *
 * Average gap ~ ln(p) ~ 16 near 10^7.
 * Max gap below 10^7: ~154.
 * Heuristic: sum g^2 ~ 2N ln N.
 *
 * Cramer's conjecture: g_n = O((ln p_n)^2).
 *
 * Algorithm: sieve primes to N+1000, iterate consecutive pairs.
 * Complexity: O(N log log N) sieve.
 */

int main() {
    const int N = 10000000;
    const int LIM = N + 1000;
    vector<bool> sieve(LIM + 1, true);
    sieve[0] = sieve[1] = false;
    for (int i = 2; (long long)i * i <= LIM; i++)
        if (sieve[i])
            for (int j = i * i; j <= LIM; j += i)
                sieve[j] = false;

    long long total = 0;
    int prev = 2;
    int max_gap = 0;
    for (int i = 3; i <= LIM; i++) {
        if (sieve[i]) {
            if (prev <= N) {
                long long g = i - prev;
                total += g * g;
                if (g > max_gap) max_gap = g;
            }
            prev = i;
        }
    }
    cout << total << endl;
    // cerr << "Max gap: " << max_gap << endl;

    return 0;
}