Long Substring with Many Repetitions
For a character string s, define L(k, s) as the length of the longest substring of s that appears at least k times. If no substring appears k times, set L(k, s) = 0. The string S_n is built via a F...
Problem Statement
This archive keeps the full statement, math, and original media on the page.
Given a character string \(s\), we define \(L(k,s)\) to be the length of the longest substring of \(s\) which appears at least \(k\) times in \(s\), or \(0\) if such a substring does not exist. For example, \(L(3,\text {“bbabcabcabcacbaâ€})=4\) because of the three occurrences of the substring \(\text {“abcaâ€}\), and \(L(2,\text {“bbabcabcabcacbaâ€})=7\) because of the repeated substring \(\text {“abcabcaâ€}\). Note that the occurrences can overlap.
Let \(a_n\), \(b_n\) and \(c_n\) be the \(0/1\) sequences defined by:
-
\(a_0 = 0\)
-
\(a_{2n} = a_n\)
-
\(a_{2n + 1} = 1 - a_n\)
-
\(b_n = \lfloor \frac {n+1}{\varphi }\rfloor - \lfloor \frac {n}{\varphi }\rfloor \) (where \(\varphi \) is the golden ratio)
-
\(c_n = a_n + b_n - 2a_nb_n\)
and \(S_n\) the character string \(c_0\ldots c_{n-1}\). You are given that \(L(2,S_{10})=5\), \(L(3,S_{10})=2\), \(L(2,S_{100})=14\), \(L(4,S_{100})=6\), \(L(2,S_{1000})=86\), \(L(3,S_{1000}) = 45\), \(L(5,S_{1000}) = 31\), and that the sum of non-zero \(L(k,S_{1000})\) for \(k\ge 1\) is \(2460\).
Find the sum of non-zero \(L(k,S_{5000000})\) for \(k\ge 1\).
Problem 691: Long Substring with Many Repetitions
Mathematical Foundation
Definition. A suffix array of a string is a permutation of indices such that the suffixes are in lexicographic order.
Definition. The LCP array gives the length of the longest common prefix between suffixes and .
Lemma 1 (Sliding Window Characterization). For ,
Proof. A substring of length appearing at least times corresponds to suffixes sharing a common prefix of length . In the sorted suffix array, these suffixes occupy consecutive positions. The minimum LCP in any window of consecutive suffixes equals their shared prefix length. Maximizing over all such windows gives .
Lemma 2 (Fibonacci String Self-Similarity). The Fibonacci string (with ) satisfies . Every factor of of length appears times, where is the golden ratio. In particular, is non-zero for up to .
Proof. The length identity is immediate from the recurrence . The factor frequency bound follows from the balanced property of Sturmian words: the infinite Fibonacci word is the canonical Sturmian word with slope , and its factor complexity is (exactly distinct factors of each length ), each appearing with frequency governed by the three-distance theorem.
Theorem (Efficient Summation). The sum can be computed in time from the LCP array using a contribution-based approach.
Proof. Note that trivially. For , we observe that (non-increasing in ). Each value contributes to for a contiguous range of values. Using a monotone stack, we compute for each index the maximal interval where is the minimum. Then is a candidate for for . Aggregating these contributions and taking running maxima yields all values in total time. The sum follows by addition.
Editorial
We build S_N or exploit Fibonacci structure. We then compute contribution of each LCP entry. Finally, iterate over each i, find span [l_i, r_i] where LCP[i] is minimum.
Pseudocode
Build S_N or exploit Fibonacci structure
Compute contribution of each LCP entry
For each i, find span [l_i, r_i] where LCP[i] is minimum
using a monotone stack
while stack not empty
Collect max L(k) for each k
(more precise bookkeeping needed for exact per-k maxima)
Propagate and sum
Complexity Analysis
- Time: where , using linear suffix array construction (SA-IS) and Kasai’s LCP algorithm, plus a single-pass stack aggregation. With Fibonacci structural shortcuts, this reduces to in the index.
- Space: for the suffix and LCP arrays.
Answer
Code
Each problem page includes the exact C++ and Python source files from the local archive.
#include <bits/stdc++.h>
using namespace std;
/*
* Problem 691: Long Substring with Many Repetitions
*
* 1. Implement the mathematical framework described above.
* 2. Optimize for the target input size.
* 3. Verify against known test values.
*/
int main() {
printf("Problem 691: Long Substring with Many Repetitions\n");
// Suffix array + LCP array construction
int N = 100;
long long total = 0;
for (int n = 1; n <= N; n++) {
total += n; // Replace with problem-specific computation
}
printf("Test sum(1..%d) = %lld\n", N, total);
printf("Full implementation needed for target input.\n");
return 0;
}
"""
Problem 691: Long Substring with Many Repetitions
"""
print("Problem 691: Long Substring with Many Repetitions")
# Core computation
N = 100 # Small test case
values = list(range(1, N + 1)) # Placeholder for problem-specific computation
# The full solution implements: Suffix array + LCP array construction
print(f"Computed {len(values)} values")
print(f"Sum = {sum(values)}")
plot_data = [values, values, values, values]