Problem Solving Journal 1

Published Feb 20, 2025

#problem-journal #math #cs #algorithms

Problems: Distinct Multisets With Equal Sum(multi)sets · Efficient Lower-Bounded Exponential Propagation Queries on Trees

Inspired by typical AoPS problem solving blogs, I have also decided to record a journal of my solutions to some interesting problems as I practice. Hopefully there will be frequent uploads. These will likely feature a combination of college/olympiad math and informatics topics along with some random musings. I might occassionally add in random problems I did in the past.

Distinct Multisets With Equal Sum(multi)sets

For some multiset A={a1,a2,,an}A = \{a_1, a_2, \dots, a_n\} of nn integers, consider the sum(multi)set S(A)={ai+aj1i<jn}S(A) = \{a_i + a_j \mid 1 \leq i < j \leq n\} of size (n2)\binom{n}{2}. For which nn do there exist distinct multisets AA, BB of size nn which satisfy S(A)=S(B)S(A) = S(B)?

—Source: Paul Erdös and John Selfridge

Solution

The answer is powers of two for n2n \geq 2, or n=2kn = 2^k for some positive integer kk. To construct examples we can recursively define

Ak+1=Ak(Bk+c)Bk+1=Bk(Ak+c)A_{k + 1} = A_k \cup (B_k + c) \quad B_{k + 1} = B_k \cup (A_k + c)

And for instance A1={0,2},B1={1,1}A_1 = \{0, 2\}, B_1 = \{1, 1\}. Here is a cute proof I found using generating functions. Note that if AA and BB works if and only if A+cA + c and B+cB + c work. Therefore we impose the multisets to contain only nonnegative elements. Consider A={a1,a2,,an}A = \{a_1, a_2, \dots, a_n\}, we can encode the multiset using

P(x)=xa1+xa2++xanP(x) = x^{a_1} + x^{a_2} + \dots + x^{a_n}

Therefore, any multiset corresponds to a polynomial with positive integer coefficients. Note that the corresponding polynomial of S(A)S(A) is

1i<jnxai+aj=P2(x)P(x2)2\sum_{1 \leq i < j \leq n} x^{a_i + a_j} = \frac{P^2(x) - P(x^2)}{2}

Then it remains to ask for which nn does there exist polynomials PQP \not = Q of positive integer coefficients such P(1)=Q(1)=nP(1) = Q(1) = n and

P2(x)P(x2)=Q2(x)Q(x2)P^2(x) - P(x^2) = Q^2(x) - Q(x^2)

The trick is to take derivatives and realize,

2P(x)P(x)P(x2)2x=2Q(x)Q(x)Q(x2)2x2P(x)P'(x) - P'(x^2)2x = 2Q(x)Q'(x) - Q'(x^2)2x

Here x=1x = 1 yields

2(n1)P(1)=2(n1)Q(1)2(n - 1)P'(1) = 2(n - 1)Q'(1)

So assuming n1n \not = 1, along with P(1)=Q(1)P(1) = Q(1) we further have P(1)=Q(1)P'(1) = Q'(1). Perhaps P(t)(1)=Q(t)(1)P^{(t)}(1) = Q^{(t)}(1) for all tt? If we consider higher order derivatives, it is clear inductively that all terms except for the terms containing P(t)P^{(t)} and Q(t)Q^{(t)} cancel on both sides. We are left with

2P(x)P(t)(x)P(t)(x2)2txt=2Q(x)Q(t)(x)Q(t)(x2)2txt2P(x) P^{(t)}(x) - P^{(t)}(x^2) 2^t x^t = 2Q(x) Q^{(t)}(x) - Q^{(t)}(x^2) 2^t x^t

Which gives P(t)(1)=Q(t)(1)P^{(t)}(1) = Q^{(t)}(1) unless 2n=2t2n = 2^t. Therefore, if nn is not a power of two, all derivatives of PP and QQ are equal at 11. It is then trivial to conclude P=QP = Q. Of course, n1n \not = 1, so we are done. \blacksquare

Efficient Lower-Bounded Exponential Propagation Queries on Trees

I attempted this problem, which has the barebones statement:

Each of nn nodes in a rooted binary tree is associated with a value, initially all zero. Each node additionally has a nonnegative integral maximum capacity. Write a program to efficiently support two types of qq queries:

  1. Update the value of a node by increasing it some integer amount. If the node’s value exceeds the capacity after the update, the excess amount will be equally split upon its children, which will update and so on. Any excess amount within a leaf node is discarded.
  2. Query the value of a node. Print the value to an absolute or relative error of 105.10^{-5}.

—Source: DMOJ OTHS Coding Competition 3 P5

The official solution was based on the output condition that an error of 10510^{-5} would be tolerated. We could lazily update the tree naively upon a query, and would not have to traverse much up the tree before the propagated excess becomes obsolete. However, the set up seems quite intricate and I wondered if we could theoretically solve the problem anyways, without the help of error tolerance. So for example, print the answer with no error/print the (trivially rational) answer modulo pp, while still doing it faster than O(qn)\mathcal{O}(qn).

Note, this is not actually possible in practice. Without error tolerance and with modulo, we still always need to maintain the true decimal values in order to compare with the capacity, and it’s impossible to do so since nn bytes could be necessary for any value. Therefore, an additional overhead of nn byte computations is unavoidable when trying to solve this without precision tolerance. But this doesn’t matter in terms of what we are actually trying to optimize in spirit. Essentially, assume numerical operations are still O(1)\mathcal{O}(1), no matter the size of numbers used, can we improve upon the naive O(qn)\mathcal{O}(qn)? I came up with this interesting approach that solves it in O(qlogn)\mathcal{O}(q \log n):

EDIT: Unfortunately, this does not fully work. See the edit.

Solution

First, we perform an Euler Tour on the tree, recording the index of the first and last traversal of a node for a total of 2n2n indices, say in pair<int, int> tour[n]\texttt{pair<int, int> tour[n]}. Then, given the values of the nodes val[n]\texttt{val[n]}, we can build an array arr[2n]\texttt{arr[2n]} by

arr[tour[i].first] = val[i]arr[tour[i].second] = -val[i]\begin{align} \texttt{arr[tour[i].first] = val[i]} \\ \texttt{arr[tour[i].second] = -val[i]} \end{align}

Now, arr\texttt{arr} has the property that the prefix sum [0,tour[i].first][0, \texttt{tour[i].first}] is the sum of the node values on the simple path from the root to node ii (inclusive). This is a well known construction.

The first idea is to normalize the values recorded in arr\texttt{arr} to account for propagation splitting. Consider having arr\texttt{arr} record the excess value stored at a node. If a node at depth d1d_1 has an excess of vv to be propagated, the actual amount that reaches a node with depth d2>d1d_2 > d_1 is v2d1d2v \cdot 2^{d1 - d2}, assuming that all nodes encountered during the process were already at capacity. Here, this means that we normalize each value vv of a node at depth dd to v2dv \cdot 2^{d}. To recover the true values, simply scale back by 2d2^{-d}. The prefix sum [0,tour[i].first][0, \texttt{tour[i].first}] in arr\texttt{arr} now corresponds to actual value currently at node ii (but normalized), considering the propagated excess from the root to node ii, assuming that all nodes along the path were already at capacity.

Now we address the limitation that not all nodes encountered during this propagation might be at capacity. Let’s pretend that there only exists a propagation from the root, and we can only update the root, and other nodes are empty. We can simply initialize the non-root nodes in arr\texttt{arr} to be the negation of their capacity, so that it starts off negative and is zero once it hits capacity. Normalized, it looks something like

arr[tour[i].first] = -capacity[i] * pow(2, depth[i])arr[tour[i].second] = capacity[i] * pow(2, depth[i])\begin{align} \texttt{arr[tour[i].first] = -capacity[i] * pow(2, depth[i])} \\ \texttt{arr[tour[i].second] = capacity[i] * pow(2, depth[i])} \end{align}

It’s clear that now if the prefix sum [0,tour[i].first1][0, \texttt{tour[i].first}-1] in arr\texttt{arr} is positive, then this sum is the normalized propagation that has reached node ii. On the other hand, if the sum is negative then there was not enough excess to make the nodes along the way hit capacity, and no propagation from the root has reached node ii. (Why?)

Therefore, to update the root we only need to directly change the values corresponding to the root in arr\texttt{arr}. To query the value of a node we can just do a prefix sum query in O(logn)\mathcal{O}(\log n) using a data structure like a Segment Tree. We should remember to restore the unnormalized value when outputting.

We now address the last limitation, what if nodes other than the root can be updated? We cannot simply update the value in arr\texttt{arr} directly because of one limitation: if the propagation from the root is cut off by a node xx not at capacity, it doesn’t mean nodes yy in the subtree of xx necessarily get no propagation. There may be a node zz between xx and yy that was directly updated beyond capacity which contributes some propagation to yy, in which case we will need to consider the range sum [l,tour[y].first1][l,\texttt{tour[y].first} - 1] in arr\texttt{arr} starting from l=tour[z].firstl = \texttt{tour[z].first}, instead of at l=tour[root].first=0l =\texttt{tour[root].first} = 0. This can go on, perhaps some node rr cuts off the propagation from zz but then another node ss has excess that propagates successfully to yy. So how do efficiently find the actual value propagated to yy?

The last idea is to see that the actual value propagated to yy, if any, is the maximum* of all range sums arr[l,tour[y].first-1]\texttt{arr[}l\texttt{,tour[y].first-1]}, where 0ltour[y].first10 \leq l \leq \texttt{tour[y].first}-1. The logic behind this is is similar to Kadane’s algorithm: we can think about the propagation as starting from the root, or starting the prefix sum at index 00, then one by one adding the next values. When our sum becomes negative, this means that the current propagation has been cut off, so we can reset it to 00. However, as in Kadane’s algorithm, this is also tracking the maximum subarray sum ending at the current index.

(*) The final caveat: the Euler tour format of arr\texttt{arr} is a bit of a problem. We know tour[i].second>tour[i].first\texttt{tour[i].second} > \texttt{tour[i].first}, and when arr[tour[i].first]0\texttt{arr[tour[i].first]} \geq 0 there will be no problems. But if arr[tour[i].first]<0\texttt{arr[tour[i].first]} < 0, such as upon initialization, we may wrongfully utilize a arr[tour[i].second]>0\texttt{arr[tour[i].second]} > 0 for a node ii that does not belong in the path in our search for the maximum range sum.

Can we make some clever change with how we query it that makes a non-issue? Yes! What we can do is query the negation of the minimum of all the range sums arr[tour[y].second+1,r]\texttt{arr[tour[y].second+1,}r\texttt{]}, where tour[y].second+1r2n1\texttt{tour[y].second}+1 \leq r \leq 2n-1, and where r=tour[j].secondr = \texttt{tour[j].second} for some jj. We can see that any such range doesn’t contain both pairs tour[k].second\texttt{tour[k].second} and tour[k].second\texttt{tour[k].second} if and only if kk is an ancestor of yy, so all candidate ranges correspond to valid paths, solving the issue. One can check this still does what we intend, since arr[tour[i].second]=arr[tour[i].first]\texttt{arr[tour[i].second]} = -\texttt{arr[tour[i].first]} always. We should remember to clamp this propagation at a lower bound of 00, in case there is none.

EDIT: Unfortunately, this does not actually work. Now there is a new issue, that the range sum does not necessarily correrspond to a valid path. There may be a r=tour[j].secondr = \texttt{tour[j].second}which has an unpaired tour[k].first\texttt{tour[k].first}within the range. I don’t think there is a simple fix. This solution then only works if either only the root can be updated or there is no concept of capacity. Nevertheless, it was a very interesting thought experiment so I will keep it here.

This is not too hard to achieve. It can be accomplished by mirroring only the later indices tour[i].second\texttt{tour[i].second} of arr\texttt{arr} in a range update and range minimum query data structure rmq\texttt{rmq} like a Lazy Segment Tree. Our queries are

min(arr[tour[y].second+1,r])=min(arr[0,r])arr[0,tour[y].second]\min(\texttt{arr[tour[y].second+1,}r\texttt{]}) = \min(\texttt{arr[0,}r\texttt{]}) - \texttt{arr[0,tour[y].second]}

Therefore, if tour[i].second\texttt{tour[i].second} is the 00-indexed jj-th later index, we can have st[j]\texttt{st[j]} maintain the prefix sum up to that tour[i].second\texttt{tour[i].second} index in arr\texttt{arr}. On each update of node ii, we can do corresponding range updates on st\texttt{st} depending on the position of tour[i].first\texttt{tour[i].first} and tour[i].second\texttt{tour[i].second} amongst the later indices. Queries on node ii are as simple as querying st.rmq[j,n-1]\texttt{st.rmq[j,n-1]} (inclusive of jj to implicitly clamp at lower bound of 00), where jj is defined on tour[i].second\texttt{tour[i].second} like previous, then subtracting st[j]\texttt{st[j]}, negating and unnormalizing. \blacksquare


© 2020-2025 • Last Updated 2025-02-20