ICPC 2019 - G. First of Her Name
Lady i has name c_i + name(p_i), that is, one new letter prepended to her mother's name. For each query string s, we must count how many ladies have names that start with s.
Source-first archive entry
This page is built from the copied files in competitive_programming/icpc/2019/G-first-of-her-name. Edit
competitive_programming/icpc/2019/G-first-of-her-name/solution.tex to update the written solution and
competitive_programming/icpc/2019/G-first-of-her-name/solution.cpp to update the implementation.
The website does not replace those files with hand-maintained HTML. It reads the copied source tree during the build and exposes the exact files below.
Problem Statement
Copied statement text kept beside the solution archive for direct reference.
Problem G
First of Her Name
Time limit: 10 seconds
In the Royal Family, names are very important! As the Royal Historian you have been charged with
analyzing the patterns in the names of the Royal Ladies in the realm.
There have been n Royal Ladies, for convenience numbered from 1 to n. The name of each Lady is an
uppercase letter concatenated with the name of her mother. The exception is the Lady numbered 1, the
founder of the Royal Family, whose name is just a single uppercase letter.
For example, ENERYS could be the mother of AENERYS (as the name AENERYS consists of the single
uppercase letter ‘A’ concatenated with ENERYS, which is her mother’s name). Similarly, AENERYS
could be the mother of DAENERYS and YAENERYS.
You are given the description of all the Royal Ladies. Your task is to determine, for certain interesting
strings s, the number of Royal Ladies for whom s is a prefix of their name.
For example, consider Sample Input 1 below, with a Royal Line that goes straight from the founder S
to AENERYS (through YS, RYS, ERYS, NERYS and ENERYS), with each Lady having exactly one
daughter. Then AENERYS has two daughters—DAENERYS and YAENERYS, with the latter having
one daughter, RYAENERYS.
In such a family, RY is a prefix of the names of two ladies: RYS and RYAENERYS. E is a prefix of the
names of ERYS and ENERYS. N is a prefix only of NERYS’s name, while S is a prefix only of the name
of the founder, S. AY is not a prefix of any Royal Lady’s name.
Input
The first line of input contains two integers n and k, where n (1 ≤ n ≤ 106 ) is the total number of Royal
Ladies and k (1 ≤ k ≤ 106 ) is the number of query strings.
Then follow n lines describing the Royal Ladies. The ith of these lines describes the Royal Lady num-
bered i, and contains an uppercase letter ci (‘A’–‘Z’) and an integer pi , where ci is the first letter of the
name of Lady i, and pi (p1 = 0 and 1 ≤ pi < i for i > 1) is the number of her mother (or 0, in the case
of the First Lady). All the names are unique.
The remaining k lines each contain one nonempty query string, consisting only of uppercase letters. The
sum of the lengths of the query strings is at most 106 .
Output
Output k lines, with the ith line containing the number of Royal Ladies who have the ith query string as
a prefix of their name.
Sample Input 1 Sample Output 1
10 5 2
S 0 2
Y 1 1
R 2 1
E 3 0
N 4
E 5
A 6
D 7
Y 7
R 9
RY
E
N
S
AY
Editorial
Rendered from the copied solution.tex file. The original TeX source remains
available below.
Key Observations
Prefixes are awkward because new letters are added at the front. Reverse every name and every query.
After reversal, Lady $i$'s reversed name is \[ \text{rev(name}(p_i)) + c_i, \] so in the reversed world each daughter is obtained by appending one letter to her mother's string.
A query $s$ is a prefix of the original name exactly when $\text{rev}(s)$ is a suffix of the reversed name.
Counting how many processed strings end with each query string is exactly what the Aho-Corasick automaton is built for.
Algorithm
Read all ladies and store only their mother index and added letter.
Insert every reversed query string into an Aho-Corasick trie.
Build suffix links and completed transitions of the automaton.
Process the ladies in increasing order:
let $state[i]$ be the automaton state reached by the reversed name of Lady $i$;
since reversed names are built by appending one letter, we get \[ state[i] = go(state[parent[i]], c_i). \]
increment a counter on $state[i]$.
After all ladies are processed, propagate counters upward along suffix links in reverse BFS order. Then each state contains the number of lady names whose reversed name has the corresponding trie string as a suffix.
For each query, output the counter of the terminal state of its reversed string. enumerate
Correctness Proof
We prove that the algorithm outputs the correct count for every query.
Lemma 1.
For every lady $i$, the automaton state
state[i]computed by the algorithm is exactly the state reached after reading the reversed name of Lady $i$.Proof.
By definition, Lady $1$'s reversed name is just one letter, so the formula holds immediately. For $i > 1$, the reversed name of Lady $i$ is the reversed name of her mother followed by the new letter $c_i$. Therefore applying one automaton transition from
state[parent[i]]by $c_i$ reaches exactly the state for Lady $i$'s reversed name. Induction on $i$ proves the claim. □Lemma 2.
For any query string $s$, a lady's original name starts with $s$ if and only if her reversed name ends with $\text{rev}(s)$.
Proof.
Reversing a string turns prefixes into suffixes and vice versa. So \[ \text{name} = s + t \] holds exactly when \[ \text{rev(name)} = \text{rev}(t) + \text{rev}(s), \] which says that $\text{rev}(s)$ is a suffix of the reversed name. □
Lemma 3.
After propagating counts upward along suffix links, the counter stored at any automaton state equals the number of ladies whose reversed names end with the string represented by that state.
Proof.
Before propagation, each lady contributes $1$ only to the state reached by her full reversed name. In the Aho-Corasick automaton, following suffix links enumerates exactly the trie strings that are suffixes of that full string. Therefore, when counts are added from each state to its suffix-link parent, every lady contributes to all terminal states corresponding to suffixes of her reversed name, and to no others. □
Theorem.
For every query string $s$, the algorithm outputs the number of ladies whose names have $s$ as a prefix.
Proof.
Let $v$ be the terminal state of $\text{rev}(s)$. By Lemma 3, the counter at $v$ is the number of ladies whose reversed names end with $\text{rev}(s)$. By Lemma 2, this is exactly the number of ladies whose original names start with $s$. □
Complexity Analysis
Let $L$ be the total length of all queries. The automaton has $O(L)$ states and is built in $O(L \cdot \Sigma)$ with $\Sigma = 26$, which is linear for a fixed alphabet. Processing all ladies is $O(n)$. Propagating counts is also $O(L)$. Thus the total running time is $O(n + L)$, and the memory usage is $O(L + n)$.
Implementation Notes
The code stores the automaton with fixed arrays of size $26$ per state, which is appropriate for the uppercase alphabet.
Query answers are stored by terminal state, so duplicate queries are handled automatically.
Because mothers always have smaller indices than daughters, the ladies can be processed in input order without any extra topological work.
Code
Exact copied C++ implementation from solution.cpp.
#include <bits/stdc++.h>
using namespace std;
namespace {
struct Node {
int next[26];
int link;
int count;
Node() : link(0), count(0) {
memset(next, 0, sizeof(next));
}
};
void solve() {
int n, k;
cin >> n >> k;
vector<int> parent(n + 1);
vector<int> letter(n + 1);
for (int i = 1; i <= n; ++i) {
char c;
int p;
cin >> c >> p;
parent[i] = p;
letter[i] = c - 'A';
}
vector<Node> trie(1);
vector<int> query_state(k);
for (int qi = 0; qi < k; ++qi) {
string s;
cin >> s;
reverse(s.begin(), s.end());
int v = 0;
for (char ch : s) {
int c = ch - 'A';
if (trie[v].next[c] == 0) {
trie[v].next[c] = (int)trie.size();
trie.emplace_back();
}
v = trie[v].next[c];
}
query_state[qi] = v;
}
vector<int> order;
order.reserve(trie.size());
queue<int> q;
for (int c = 0; c < 26; ++c) {
int to = trie[0].next[c];
if (to != 0) {
q.push(to);
order.push_back(to);
}
}
while (!q.empty()) {
int v = q.front();
q.pop();
for (int c = 0; c < 26; ++c) {
int to = trie[v].next[c];
if (to != 0) {
trie[to].link = trie[trie[v].link].next[c];
q.push(to);
order.push_back(to);
} else {
trie[v].next[c] = trie[trie[v].link].next[c];
}
}
}
vector<int> lady_state(n + 1, 0);
for (int i = 1; i <= n; ++i) {
lady_state[i] = trie[lady_state[parent[i]]].next[letter[i]];
++trie[lady_state[i]].count;
}
for (int i = (int)order.size() - 1; i >= 0; --i) {
int v = order[i];
trie[trie[v].link].count += trie[v].count;
}
for (int qi = 0; qi < k; ++qi) {
cout << trie[query_state[qi]].count << '\n';
}
}
} // namespace
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
solve();
return 0;
}
Source Files
Exact copied source-of-truth files. Edit solution.tex for the write-up and solution.cpp for the implementation.
\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{enumitem}
\title{ICPC World Finals 2019\\G. First of Her Name}
\author{}
\date{}
\begin{document}
\maketitle
\section*{Problem Summary}
Lady $i$ has name
\[
c_i + \text{name}(p_i),
\]
that is, one new letter prepended to her mother's name. For each query string $s$, we must count how many ladies have names that start with $s$.
\section*{Key Observations}
\begin{itemize}[leftmargin=*]
\item Prefixes are awkward because new letters are added at the \emph{front}. Reverse every name and every query.
\item After reversal, Lady $i$'s reversed name is
\[
\text{rev(name}(p_i)) + c_i,
\]
so in the reversed world each daughter is obtained by appending one letter to her mother's string.
\item A query $s$ is a prefix of the original name exactly when $\text{rev}(s)$ is a suffix of the reversed name.
\item Counting how many processed strings end with each query string is exactly what the Aho-Corasick automaton is built for.
\end{itemize}
\section*{Algorithm}
\begin{enumerate}[leftmargin=*]
\item Read all ladies and store only their mother index and added letter.
\item Insert every reversed query string into an Aho-Corasick trie.
\item Build suffix links and completed transitions of the automaton.
\item Process the ladies in increasing order:
\begin{itemize}[leftmargin=*]
\item let $state[i]$ be the automaton state reached by the reversed name of Lady $i$;
\item since reversed names are built by appending one letter, we get
\[
state[i] = go(state[parent[i]], c_i).
\]
\item increment a counter on $state[i]$.
\end{itemize}
\item After all ladies are processed, propagate counters upward along suffix links in reverse BFS order. Then each state contains the number of lady names whose reversed name has the corresponding trie string as a suffix.
\item For each query, output the counter of the terminal state of its reversed string.
\end{enumerate}
\section*{Correctness Proof}
We prove that the algorithm outputs the correct count for every query.
\paragraph{Lemma 1.}
For every lady $i$, the automaton state \texttt{state[i]} computed by the algorithm is exactly the state reached after reading the reversed name of Lady $i$.
\paragraph{Proof.}
By definition, Lady $1$'s reversed name is just one letter, so the formula holds immediately. For $i > 1$, the reversed name of Lady $i$ is the reversed name of her mother followed by the new letter $c_i$. Therefore applying one automaton transition from \texttt{state[parent[i]]} by $c_i$ reaches exactly the state for Lady $i$'s reversed name. Induction on $i$ proves the claim. \qed
\paragraph{Lemma 2.}
For any query string $s$, a lady's original name starts with $s$ if and only if her reversed name ends with $\text{rev}(s)$.
\paragraph{Proof.}
Reversing a string turns prefixes into suffixes and vice versa. So
\[
\text{name} = s + t
\]
holds exactly when
\[
\text{rev(name)} = \text{rev}(t) + \text{rev}(s),
\]
which says that $\text{rev}(s)$ is a suffix of the reversed name. \qed
\paragraph{Lemma 3.}
After propagating counts upward along suffix links, the counter stored at any automaton state equals the number of ladies whose reversed names end with the string represented by that state.
\paragraph{Proof.}
Before propagation, each lady contributes $1$ only to the state reached by her full reversed name. In the Aho-Corasick automaton, following suffix links enumerates exactly the trie strings that are suffixes of that full string. Therefore, when counts are added from each state to its suffix-link parent, every lady contributes to all terminal states corresponding to suffixes of her reversed name, and to no others. \qed
\paragraph{Theorem.}
For every query string $s$, the algorithm outputs the number of ladies whose names have $s$ as a prefix.
\paragraph{Proof.}
Let $v$ be the terminal state of $\text{rev}(s)$. By Lemma 3, the counter at $v$ is the number of ladies whose reversed names end with $\text{rev}(s)$. By Lemma 2, this is exactly the number of ladies whose original names start with $s$. \qed
\section*{Complexity Analysis}
Let $L$ be the total length of all queries. The automaton has $O(L)$ states and is built in $O(L \cdot \Sigma)$ with $\Sigma = 26$, which is linear for a fixed alphabet. Processing all ladies is $O(n)$. Propagating counts is also $O(L)$. Thus the total running time is $O(n + L)$, and the memory usage is $O(L + n)$.
\section*{Implementation Notes}
\begin{itemize}[leftmargin=*]
\item The code stores the automaton with fixed arrays of size $26$ per state, which is appropriate for the uppercase alphabet.
\item Query answers are stored by terminal state, so duplicate queries are handled automatically.
\item Because mothers always have smaller indices than daughters, the ladies can be processed in input order without any extra topological work.
\end{itemize}
\end{document}
#include <bits/stdc++.h>
using namespace std;
namespace {
struct Node {
int next[26];
int link;
int count;
Node() : link(0), count(0) {
memset(next, 0, sizeof(next));
}
};
void solve() {
int n, k;
cin >> n >> k;
vector<int> parent(n + 1);
vector<int> letter(n + 1);
for (int i = 1; i <= n; ++i) {
char c;
int p;
cin >> c >> p;
parent[i] = p;
letter[i] = c - 'A';
}
vector<Node> trie(1);
vector<int> query_state(k);
for (int qi = 0; qi < k; ++qi) {
string s;
cin >> s;
reverse(s.begin(), s.end());
int v = 0;
for (char ch : s) {
int c = ch - 'A';
if (trie[v].next[c] == 0) {
trie[v].next[c] = (int)trie.size();
trie.emplace_back();
}
v = trie[v].next[c];
}
query_state[qi] = v;
}
vector<int> order;
order.reserve(trie.size());
queue<int> q;
for (int c = 0; c < 26; ++c) {
int to = trie[0].next[c];
if (to != 0) {
q.push(to);
order.push_back(to);
}
}
while (!q.empty()) {
int v = q.front();
q.pop();
for (int c = 0; c < 26; ++c) {
int to = trie[v].next[c];
if (to != 0) {
trie[to].link = trie[trie[v].link].next[c];
q.push(to);
order.push_back(to);
} else {
trie[v].next[c] = trie[trie[v].link].next[c];
}
}
}
vector<int> lady_state(n + 1, 0);
for (int i = 1; i <= n; ++i) {
lady_state[i] = trie[lady_state[parent[i]]].next[letter[i]];
++trie[lady_state[i]].count;
}
for (int i = (int)order.size() - 1; i >= 0; --i) {
int v = order[i];
trie[trie[v].link].count += trie[v].count;
}
for (int qi = 0; qi < k; ++qi) {
cout << trie[query_state[qi]].count << '\n';
}
}
} // namespace
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
solve();
return 0;
}