Problem Solving Journal 1

Published Feb 20, 2025

Problems: Distinct Multisets With Equal Sum(multi)sets · Efficient Lower-Bounded Exponential Propagation Queries on Trees

Inspired by typical AoPS problem solving blogs, I have also decided to record a journal of my solutions to some interesting problems as I practice. Hopefully there will be frequent uploads. These will likely feature a combination of college/olympiad math and informatics topics along with some random musings. I might occassionally add in random problems I did in the past.

Distinct Multisets With Equal Sum(multi)sets

For some multiset $A = \{a_1, a_2, \dots, a_n\}$ of $n$ integers, consider the sum(multi)set $S(A) = \{a_i + a_j \mid 1 \leq i < j \leq n\}$ of size $\binom{n}{2}$ . For which $n$ do there exist distinct multisets $A$ , $B$ of size $n$ which satisfy $S(A) = S(B)$ ?

—Source: Paul Erdös and John Selfridge

Solution

The answer is powers of two for $n \geq 2$ , or $n = 2^k$ for some positive integer $k$ . To construct examples we can recursively define

A_{k + 1} = A_k \cup (B_k + c) \quad B_{k + 1} = B_k \cup (A_k + c)

And for instance $A_1 = \{0, 2\}, B_1 = \{1, 1\}$ . Here is a cute proof I found using generating functions. Note that if $A$ and $B$ works if and only if $A + c$ and $B + c$ work. Therefore we impose the multisets to contain only nonnegative elements. Consider $A = \{a_1, a_2, \dots, a_n\}$ , we can encode the multiset using

P(x) = x^{a_1} + x^{a_2} + \dots + x^{a_n}

Therefore, any multiset corresponds to a polynomial with positive integer coefficients. Note that the corresponding polynomial of $S(A)$ is

\sum_{1 \leq i < j \leq n} x^{a_i + a_j} = \frac{P^2(x) - P(x^2)}{2}

Then it remains to ask for which $n$ does there exist polynomials $P \not = Q$ of positive integer coefficients such $P(1) = Q(1) = n$ and

P^2(x) - P(x^2) = Q^2(x) - Q(x^2)

The trick is to take derivatives and realize,

2P(x)P'(x) - P'(x^2)2x = 2Q(x)Q'(x) - Q'(x^2)2x

Here $x = 1$ yields

2(n - 1)P'(1) = 2(n - 1)Q'(1)

So assuming $n \not = 1$ , along with $P(1) = Q(1)$ we further have $P'(1) = Q'(1)$ . Perhaps $P^{(t)}(1) = Q^{(t)}(1)$ for all $t$ ? If we consider higher order derivatives, it is clear inductively that all terms except for the terms containing $P^{(t)}$ and $Q^{(t)}$ cancel on both sides. We are left with

2P(x) P^{(t)}(x) - P^{(t)}(x^2) 2^t x^t = 2Q(x) Q^{(t)}(x) - Q^{(t)}(x^2) 2^t x^t

Which gives $P^{(t)}(1) = Q^{(t)}(1)$ unless $2n = 2^t$ . Therefore, if $n$ is not a power of two, all derivatives of $P$ and $Q$ are equal at $1$ . It is then trivial to conclude $P = Q$ . Of course, $n \not = 1$ , so we are done. $\blacksquare$

Efficient Lower-Bounded Exponential Propagation Queries on Trees

I attempted this problem, which has the barebones statement:

Each of $n$ nodes in a rooted binary tree is associated with a value, initially all zero. Each node additionally has a nonnegative integral maximum capacity. Write a program to efficiently support two types of $q$ queries:

Update the value of a node by increasing it some integer amount. If the node’s value exceeds the capacity after the update, the excess amount will be equally split upon its children, which will update and so on. Any excess amount within a leaf node is discarded.

Query the value of a node. Print the value to an absolute or relative error of $10^{-5}.$

—Source: DMOJ OTHS Coding Competition 3 P5

The official solution was based on the output condition that an error of $10^{-5}$ would be tolerated. We could lazily update the tree naively upon a query, and would not have to traverse much up the tree before the propagated excess becomes obsolete. However, the set up seems quite intricate and I wondered if we could theoretically solve the problem anyways, without the help of error tolerance. So for example, print the answer with no error/print the (trivially rational) answer modulo $p$ , while still doing it faster than $\mathcal{O}(qn)$ .

Note, this is not actually possible in practice. Without error tolerance and with modulo, we still always need to maintain the true decimal values in order to compare with the capacity, and it’s impossible to do so since $n$ bytes could be necessary for any value. Therefore, an additional overhead of $n$ byte computations is unavoidable when trying to solve this without precision tolerance. But this doesn’t matter in terms of what we are actually trying to optimize in spirit. Essentially, assume numerical operations are still $\mathcal{O}(1)$ , no matter the size of numbers used, can we improve upon the naive $\mathcal{O}(qn)$ ? I came up with this interesting approach that solves it in $\mathcal{O}(q \log n)$ :

EDIT: Unfortunately, this does not fully work. See the edit.

Solution

First, we perform an Euler Tour on the tree, recording the index of the first and last traversal of a node for a total of $2n$ indices, say in $\texttt{pair<int, int> tour[n]}$ . Then, given the values of the nodes $\texttt{val[n]}$ , we can build an array $\texttt{arr[2n]}$ by

\begin{align} \texttt{arr[tour[i].first] = val[i]} \\ \texttt{arr[tour[i].second] = -val[i]} \end{align}

Now, $\texttt{arr}$ has the property that the prefix sum $[0, \texttt{tour[i].first}]$ is the sum of the node values on the simple path from the root to node $i$ (inclusive). This is a well known construction.

The first idea is to normalize the values recorded in $\texttt{arr}$ to account for propagation splitting. Consider having $\texttt{arr}$ record the excess value stored at a node. If a node at depth $d_1$ has an excess of $v$ to be propagated, the actual amount that reaches a node with depth $d_2 > d_1$ is $v \cdot 2^{d1 - d2}$ , assuming that all nodes encountered during the process were already at capacity. Here, this means that we normalize each value $v$ of a node at depth $d$ to $v \cdot 2^{d}$ . To recover the true values, simply scale back by $2^{-d}$ . The prefix sum $[0, \texttt{tour[i].first}]$ in $\texttt{arr}$ now corresponds to actual value currently at node $i$ (but normalized), considering the propagated excess from the root to node $i$ , assuming that all nodes along the path were already at capacity.

Now we address the limitation that not all nodes encountered during this propagation might be at capacity. Let’s pretend that there only exists a propagation from the root, and we can only update the root, and other nodes are empty. We can simply initialize the non-root nodes in $\texttt{arr}$ to be the negation of their capacity, so that it starts off negative and is zero once it hits capacity. Normalized, it looks something like

\begin{align} \texttt{arr[tour[i].first] = -capacity[i] * pow(2, depth[i])} \\ \texttt{arr[tour[i].second] = capacity[i] * pow(2, depth[i])} \end{align}

It’s clear that now if the prefix sum $[0, \texttt{tour[i].first}-1]$ in $\texttt{arr}$ is positive, then this sum is the normalized propagation that has reached node $i$ . On the other hand, if the sum is negative then there was not enough excess to make the nodes along the way hit capacity, and no propagation from the root has reached node $i$ . (Why?)

Therefore, to update the root we only need to directly change the values corresponding to the root in $\texttt{arr}$ . To query the value of a node we can just do a prefix sum query in $\mathcal{O}(\log n)$ using a data structure like a Segment Tree. We should remember to restore the unnormalized value when outputting.

We now address the last limitation, what if nodes other than the root can be updated? We cannot simply update the value in $\texttt{arr}$ directly because of one limitation: if the propagation from the root is cut off by a node $x$ not at capacity, it doesn’t mean nodes $y$ in the subtree of $x$ necessarily get no propagation. There may be a node $z$ between $x$ and $y$ that was directly updated beyond capacity which contributes some propagation to $y$ , in which case we will need to consider the range sum $[l,\texttt{tour[y].first} - 1]$ in $\texttt{arr}$ starting from $l = \texttt{tour[z].first}$ , instead of at $l =\texttt{tour[root].first} = 0$ . This can go on, perhaps some node $r$ cuts off the propagation from $z$ but then another node $s$ has excess that propagates successfully to $y$ . So how do efficiently find the actual value propagated to $y$ ?

The last idea is to see that the actual value propagated to $y$ , if any, is the maximum* of all range sums $\texttt{arr[}l\texttt{,tour[y].first-1]}$ , where $0 \leq l \leq \texttt{tour[y].first}-1$ . The logic behind this is is similar to Kadane’s algorithm: we can think about the propagation as starting from the root, or starting the prefix sum at index $0$ , then one by one adding the next values. When our sum becomes negative, this means that the current propagation has been cut off, so we can reset it to $0$ . However, as in Kadane’s algorithm, this is also tracking the maximum subarray sum ending at the current index.

(*) The final caveat: the Euler tour format of $\texttt{arr}$ is a bit of a problem. We know $\texttt{tour[i].second} > \texttt{tour[i].first}$ , and when $\texttt{arr[tour[i].first]} \geq 0$ there will be no problems. But if $\texttt{arr[tour[i].first]} < 0$ , such as upon initialization, we may wrongfully utilize a $\texttt{arr[tour[i].second]} > 0$ for a node $i$ that does not belong in the path in our search for the maximum range sum.

Can we make some clever change with how we query it that makes a non-issue? Yes! What we can do is query the negation of the minimum of all the range sums $\texttt{arr[tour[y].second+1,}r\texttt{]}$ , where $\texttt{tour[y].second}+1 \leq r \leq 2n-1$ , and where $r = \texttt{tour[j].second}$ for some $j$ . We can see that any such range doesn’t contain both pairs $\texttt{tour[k].second}$ and $\texttt{tour[k].second}$ if and only if $k$ is an ancestor of $y$ , so all candidate ranges correspond to valid paths, solving the issue. One can check this still does what we intend, since $\texttt{arr[tour[i].second]} = -\texttt{arr[tour[i].first]}$ always. We should remember to clamp this propagation at a lower bound of $0$ , in case there is none.

EDIT: Unfortunately, this does not actually work. Now there is a new issue, that the range sum does not necessarily correrspond to a valid path. There may be a $r = \texttt{tour[j].second}$ which has an unpaired $\texttt{tour[k].first}$ within the range. I don’t think there is a simple fix. This solution then only works if either only the root can be updated or there is no concept of capacity. Nevertheless, it was a very interesting thought experiment so I will keep it here.

This is not too hard to achieve. It can be accomplished by mirroring only the later indices $\texttt{tour[i].second}$ of $\texttt{arr}$ in a range update and range minimum query data structure $\texttt{rmq}$ like a Lazy Segment Tree. Our queries are

\min(\texttt{arr[tour[y].second+1,}r\texttt{]}) = \min(\texttt{arr[0,}r\texttt{]}) - \texttt{arr[0,tour[y].second]}

Therefore, if $\texttt{tour[i].second}$ is the $0$ -indexed $j$ -th later index, we can have $\texttt{st[j]}$ maintain the prefix sum up to that $\texttt{tour[i].second}$ index in $\texttt{arr}$ . On each update of node $i$ , we can do corresponding range updates on $\texttt{st}$ depending on the position of $\texttt{tour[i].first}$ and $\texttt{tour[i].second}$ amongst the later indices. Queries on node $i$ are as simple as querying $\texttt{st.rmq[j,n-1]}$ (inclusive of $j$ to implicitly clamp at lower bound of $0$ ), where $j$ is defined on $\texttt{tour[i].second}$ like previous, then subtracting $\texttt{st[j]}$ , negating and unnormalizing. $\blacksquare$