Problem Solving Journal 1
Published Feb 20, 2025
#problem-journal #math #cs #algorithmsProblems: Distinct Multisets With Equal Sum(multi)sets · Efficient Lower-Bounded Exponential Propagation Queries on Trees
Inspired by typical AoPS problem solving blogs, I have also decided to record a journal of my solutions to some interesting problems as I practice. Hopefully there will be frequent uploads. These will likely feature a combination of college/olympiad math and informatics topics along with some random musings. I might occassionally add in random problems I did in the past.
Distinct Multisets With Equal Sum(multi)sets
For some multiset of integers, consider the sum(multi)set of size . For which do there exist distinct multisets , of size which satisfy ?
—Source: Paul Erdös and John Selfridge
Solution
The answer is powers of two for , or for some positive integer . To construct examples we can recursively define
And for instance . Here is a cute proof I found using generating functions. Note that if and works if and only if and work. Therefore we impose the multisets to contain only nonnegative elements. Consider , we can encode the multiset using
Therefore, any multiset corresponds to a polynomial with positive integer coefficients. Note that the corresponding polynomial of is
Then it remains to ask for which does there exist polynomials of positive integer coefficients such and
The trick is to take derivatives and realize,
Here yields
So assuming , along with we further have . Perhaps for all ? If we consider higher order derivatives, it is clear inductively that all terms except for the terms containing and cancel on both sides. We are left with
Which gives unless . Therefore, if is not a power of two, all derivatives of and are equal at . It is then trivial to conclude . Of course, , so we are done.
Efficient Lower-Bounded Exponential Propagation Queries on Trees
I attempted this problem, which has the barebones statement:
Each of nodes in a rooted binary tree is associated with a value, initially all zero. Each node additionally has a nonnegative integral maximum capacity. Write a program to efficiently support two types of queries:
- Update the value of a node by increasing it some integer amount. If the node’s value exceeds the capacity after the update, the excess amount will be equally split upon its children, which will update and so on. Any excess amount within a leaf node is discarded.
- Query the value of a node. Print the value to an absolute or relative error of
—Source: DMOJ OTHS Coding Competition 3 P5
The official solution was based on the output condition that an error of would be tolerated. We could lazily update the tree naively upon a query, and would not have to traverse much up the tree before the propagated excess becomes obsolete. However, the set up seems quite intricate and I wondered if we could theoretically solve the problem anyways, without the help of error tolerance. So for example, print the answer with no error/print the (trivially rational) answer modulo , while still doing it faster than .
Note, this is not actually possible in practice. Without error tolerance and with modulo, we still always need to maintain the true decimal values in order to compare with the capacity, and it’s impossible to do so since bytes could be necessary for any value. Therefore, an additional overhead of byte computations is unavoidable when trying to solve this without precision tolerance. But this doesn’t matter in terms of what we are actually trying to optimize in spirit. Essentially, assume numerical operations are still , no matter the size of numbers used, can we improve upon the naive ? I came up with this interesting approach that solves it in :
EDIT: Unfortunately, this does not fully work. See the edit.
Solution
First, we perform an Euler Tour on the tree, recording the index of the first and last traversal of a node for a total of indices, say in . Then, given the values of the nodes , we can build an array by
Now, has the property that the prefix sum is the sum of the node values on the simple path from the root to node (inclusive). This is a well known construction.
The first idea is to normalize the values recorded in to account for propagation splitting. Consider having record the excess value stored at a node. If a node at depth has an excess of to be propagated, the actual amount that reaches a node with depth is , assuming that all nodes encountered during the process were already at capacity. Here, this means that we normalize each value of a node at depth to . To recover the true values, simply scale back by . The prefix sum in now corresponds to actual value currently at node (but normalized), considering the propagated excess from the root to node , assuming that all nodes along the path were already at capacity.
Now we address the limitation that not all nodes encountered during this propagation might be at capacity. Let’s pretend that there only exists a propagation from the root, and we can only update the root, and other nodes are empty. We can simply initialize the non-root nodes in to be the negation of their capacity, so that it starts off negative and is zero once it hits capacity. Normalized, it looks something like
It’s clear that now if the prefix sum in is positive, then this sum is the normalized propagation that has reached node . On the other hand, if the sum is negative then there was not enough excess to make the nodes along the way hit capacity, and no propagation from the root has reached node . (Why?)
Therefore, to update the root we only need to directly change the values corresponding to the root in . To query the value of a node we can just do a prefix sum query in using a data structure like a Segment Tree. We should remember to restore the unnormalized value when outputting.
We now address the last limitation, what if nodes other than the root can be updated? We cannot simply update the value in directly because of one limitation: if the propagation from the root is cut off by a node not at capacity, it doesn’t mean nodes in the subtree of necessarily get no propagation. There may be a node between and that was directly updated beyond capacity which contributes some propagation to , in which case we will need to consider the range sum in starting from , instead of at . This can go on, perhaps some node cuts off the propagation from but then another node has excess that propagates successfully to . So how do efficiently find the actual value propagated to ?
The last idea is to see that the actual value propagated to , if any, is the maximum* of all range sums , where . The logic behind this is is similar to Kadane’s algorithm: we can think about the propagation as starting from the root, or starting the prefix sum at index , then one by one adding the next values. When our sum becomes negative, this means that the current propagation has been cut off, so we can reset it to . However, as in Kadane’s algorithm, this is also tracking the maximum subarray sum ending at the current index.
(*) The final caveat: the Euler tour format of is a bit of a problem. We know , and when there will be no problems. But if , such as upon initialization, we may wrongfully utilize a for a node that does not belong in the path in our search for the maximum range sum.
Can we make some clever change with how we query it that makes a non-issue? Yes! What we can do is query the negation of the minimum of all the range sums , where , and where for some . We can see that any such range doesn’t contain both pairs and if and only if is an ancestor of , so all candidate ranges correspond to valid paths, solving the issue. One can check this still does what we intend, since always. We should remember to clamp this propagation at a lower bound of , in case there is none.
EDIT: Unfortunately, this does not actually work. Now there is a new issue, that the range sum does not necessarily correrspond to a valid path. There may be a which has an unpaired within the range. I don’t think there is a simple fix. This solution then only works if either only the root can be updated or there is no concept of capacity. Nevertheless, it was a very interesting thought experiment so I will keep it here.
This is not too hard to achieve. It can be accomplished by mirroring only the later indices of in a range update and range minimum query data structure like a Lazy Segment Tree. Our queries are
Therefore, if is the -indexed -th later index, we can have maintain the prefix sum up to that index in . On each update of node , we can do corresponding range updates on depending on the position of and amongst the later indices. Queries on node are as simple as querying (inclusive of to implicitly clamp at lower bound of ), where is defined on like previous, then subtracting , negating and unnormalizing.