Algorithms 3 - Maximal Path Sum between Two Leaves

For this week's algorithm post, we'll examine a recent exam problem I had for my algorithms class! I couldn't come up with a story for this one, though, sorry. :( n.b. I present here my refined solution from the exam. It got full credit, but alternative solutions do exist!

Problem. You are given a rooted complete binary tree $T$ of height $h$ with $n = 2^h$ leaves such that every node $x$ in this tree is associated with a value, $v(x)$, an arbitrary positive integer. Assume that for each node in the tree, you have a pointer to its left child, its right child, and its parent in the tree.

Part 1.

For each node $u$ in the tree and a leaf node $x$ in the subtree of $u$, let $V(u, x)$ be the sum of the values of all nodes on the path from $u$ to $x$ (including the values of $u$ and $x$). Now define $g(u)$ to be the maximum value of $V(u,x)$, taken over all leaf nodes $x$ in the subtree of $u$. Design an $O(n)$ time algorithm to compute the value $g(u)$ for every node in the tree.

Solution.

Algorithm. Let the contents of $T$ be stored in a one-indexed array $A$. Create a table $B$ of size $n$. This table will cache the value of $g(x)$ for each node $x \in T$. Start iterating over the leaves (since $T$ is a complete tree, elements of $A$ starting at $A[\frac{n}{2}]$). For each leaf $l$, $B[l] = v(l)$. When finished, iterate over the parents of the leaf nodes. For each parent $p$, Assign: $$B[p] = \max\big\{v(:\text{left}(p):), v(:\text{right}(p):)\big\}$$
Repeat the previous step for each level of the tree, traversing upwards. Once we have reached and computed the root, stop. Return $B$.

Proof of correctness.
We observe that the algorithm produces the correct result for the base case where $|T| < 2$ and for all the leaf nodes. Since the bottom-up algorithm uses pre-computed values of $g(x)$ to compute $g(x)$ for each parent node at each iterative step, we can inductively reason that the algorithm is holistically correct.

Runtime analysis.
This is an iterative solution that touches each node exactly once by simply storing values of $g(x)$ in a table with constant time access. Since at each step, the only work done is the table-access and comparison logic, we have $O(n)$ time.

Part 2.

For a leaf node $x$, we denote by $A(x)$, the set of nodes on the unique path from the leaf $x$ to the root (including $x$ and the root). Now for any two distinct leaf nodes $x$ and $y$, we define the function $f(x,y)$ as:

$$f(x,y)=\sum_{z\in A(x) \cup A(y)} v(z)$$

Design an $O(n)$ time algorithm that outputs the maximum value for the function $f$, i.e. the largest value of $f(x,y)$ where $x$ and $y$ range over all pairs of distinct leaf nodes in the tree.

Algorithm. We modify the algorithm from part 1 so that each table entry of B contains the value of $g(x)$ and the optimal path. (This can be done trivially since it uses the same inductive logic as before.) Compute $f_1 = g(root)$. Let $P$ be the optimal path for the root. If $P$ is empty, we are done. Return $B[root].$ Check which child of the root is on $P$. Let $x$ be the child in $P$ and $y$ the child not in $P$. Now return $R(root) = B[root] + \max\big\{B[y], R(x)\big\}$.

Proof of Correctness.
It holds that the maximal path and the second-most maximal path will produce the maximum value for $f$. If this assumption is true, then the algorithm is correct. Let us suppose for the purpose of contradiction that these maximal paths are not in the solution. Then there exists another two unique paths to the leaves whose value sums are the largest. However, this presents a contradiction. If one of these path sums is smaller than that of the maximal path, then the maximal path should have been included in the solution instead. Therefore, the assumption holds, and we are done.

Runtime.
Building the table $B$ runs in $O(n)$ time, even with the modification.
It also takes $O(n)$ to recurse over the entire tree.

Comments

Overall, this question is difficult by the sheer amount of reading needed to tackle it. Using an iterative solution for part 1 appears to be the optimal approach, as it takes $O(n)$ space with exactly one iteration through the tree. Because it is also not recursive, there is no overhead from function calls or continuous stack-space being consumed. For part 2, we trace through the entire tree, searching for the next most optimal path, also in $O(n)$ time.

Do you have a more optimal solution? Feel free to share it!

Thanks for reading, and have a wonderful day.