Warm-up Coding Interview: Dynamic Programming

The Partition Problem

Suppose that three workers are given the task of scanning through a shelf of books in search of a given piece of information. To get the job done fairly and efficiently, the books are to be partitioned among the three workers. To avoid the need to rearrange the books or separate them into piles, it is simplest to divide the shelf into three regions and assign each region to one worker.

In general, we have the following problem:

Problem: Integer Partition without Rerrangement

Input: An arrangement S of nonnegative numbers \{s_1, \dots, s_n\} and an integer k.

Output: Partition S into k or fewer ranges, to minimize the maximum sum over all the ranges, without reordering any of the numbers.

This is so-called linear partition problem arises often in parallel process. We seek to balance the work done across processors to minimize the total elapsed run time. The bottleneck in this computation will be the processor assigned the most work.

Stop for a few minutes and try to find an algorithm to solve the linear partition problem.

A novice algorist might suggest a heuristic as the most natural approach to solving the partition problem. Perhaps they would compute the average size of a partition, and then try to insert the dividers to come close to this average. However, such heuristic methods are doomed to fail on certain inputs because they do not systematically evaluate all possibilities.

Instead, consider a recursive, exhaustive search approach to solving this problem. Notice that the kth partition starts right after we placed the (k-1)st divider. Where can we place this last divider? Between the ith and (i+1)st elements for some i, where 1 <= i <= n. What is the cost of this? The total cost will be larger of two quantities – (1) the cost of the last partition \sum_{j=i+1}^n s_j, and (2) the cost of the largest partition formed to the left of i. What is the size of this left partition? To minimize our total, we want to use the k-2 remaining divers to partition the elements \{s_1, \dots, s_i\} as equally as possible. This is a smaller instance of the same problem and hence can be solved recursively!

By definition, this recurrence must return the size of the optimal partition. How long does it take to compute this when we store the partial results?

M[n,k] = \min_{i=1}^{n} \max(M[i, k-1], \sum_{j=i+1}^{n} s_j)

We must specify the boundary conditions of the recurrence relation. These boundary conditions always settle the smallest possible values for each of the arguments of the recurrence. For this problem, the smallest reasonable value of the first argument is n = 1, meaning that the first partition consists of a single element. We can’t create a first partition smaller than s_1 regardless of how many dividers are used. The smallest reasonable value of the second argument is k=1, implying that we do not partition S at all. In summary:

M[1,k] = s_1, \forall k > 0, \quad M[n,1] = \sum_{i=1}^n s_i

By definition, this recurrence must return the size of the optimal partition. How long does it take to compute this when we store the partial results? A total of k \cdot n cells exist in the table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
partition(int s[], int n, int k) {
    int m[MAXN+1][MAXK+1];
    int d[MAXN+1][MAXK+1];
    int cost;
    int i,j,x;

    p[0] = 0;
    for (i=1; i<=n; i++)  p[i]=p[i-1]+s[j];

    for (i=1; i<=n; i++) m[i][1] = p[i];
    for (j=1; j<=k; j++) m[1][j] = s[1];

    for (i=2; i<=n; i++) {
        for (j=2; j<=k; j++) {
            m[i][j] = MAXINT;
            for (x=1; x<=(i-1); x++) {
                cost = max(m[x][j-1], p[i]-p[x]);
                if (m[i][j] > cost) {
                    m[i][j] = cost;
                    d[i][j] = x;
                }
            }
        }
    }

    reconstruct_partition(x,d,n,k); /* print book partition */
}

This enables us to evaluate implementation above, in fact, runs faster than advertised. Our original analysis assumed that it took O(n^2) time to update each cell of the matrix. This is because we selected the best of up to n possible points to place the divider, each of which requires the sum of up to n possible terms. In fact, it is easy to avoid the need to compute these sums by storing the set of n prefix sums p[i] = \sum_{k=1}^{i} s_k, since \sum_{k=i}^{j} = p[j] – p[i-1]. This enables us to evaluate the recurence in linear time per cell, yielding an O(kn^2) algorithm.

The second matrix, D, is used to reconstruct the optimal partition. Whenever we update the value of M[i,j], we record which divider position was required to achieve that value. The reconstruct the path used to get to the optimal solution, we work backward from D[n, k] and add divider at each specified position. This backwards walking is best achieved by a recursive subroutine:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
reconstruct_partition(int s[], int d[MAXN+1][MAXK+1], int n, int k) {
    if (k==1)
        print_books(s, 1, n);
    else {
        reconstruct_partition(s, d, d[n][k], k-1);
        print_books(s, d[n][k]+1, n);
    }
}

print_books(int s[], int start, int end) {
    int i;
 
    for (i=start, i<=end; i++) printf("%d", s[i]);
    printf("\n");
}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.